Simon Willison’s Weblog

676 items tagged “python”

Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage. # 22nd May 2018, 10:12 pm

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml. # 11th May 2018, 5:47 pm

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser. # 25th April 2018, 5:17 am

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well. # 24th April 2018, 4:17 pm

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries. # 19th April 2018, 6:34 pm

Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library. # 19th April 2018, 6:32 pm

How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin. # 19th April 2018, 6:30 pm

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before. # 19th April 2018, 6:28 pm

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm

Parsing CSV using ANTLR and Python 3. I’ve been trying to figure out how to use ANTLR grammars from Python—this is the first example I’ve found that really clicked for me. # 6th April 2018, 2:33 pm

rubber-docker/linux.c. rubber-docker is a workshop that talks through building a simply Docker clone from scratch in Python. I particularly liked this detail: linux.c is a Python extension written in C that exposes a small collection of Linux syscalls that are needed for the project—clone, mount, pivot_root, setns, umount, umount2 and unshare. Just reading through this module gives a really nice overview of how some of Docker’s underlying magic actually work. # 2nd April 2018, 6:18 pm

import-pypi. A devious Python 3 hack which abuses importlib.machinery to add a hook such that any time you type “import modulename” it checks to see if the module is installed and runs “pip install modulename” first if it isn’t. Intended as a joke, but if you habitually fire up temporary virtual environments for exploratory programming like I do this could actually be a neat little time-saver. # 29th March 2018, 10:16 pm

How to use HDF5 files in Python (via) HDF5: the missing manual. A detailed explanation of the HDF5 file format and how to work with it using the h5py module. HDF5 allows you to efficiently store multiple datasets (plus metatdata about them) in a single file and then load data from them without pulling the entire file into memory—kind of like SQLite but without the SQL support and more optimized for working with arrays. # 19th March 2018, 2:55 pm

Trio Tutorial. Trio is a really nice async library for Python—a simpler alternative to asyncio, with some very clean API design. Best of all, the tutorial is fantastic—it provides a very clear explanation of async/await without diving into the intricacies of coroutines. # 17th March 2018, 3:55 pm

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google. # 21st February 2018, 5:54 pm

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you. # 21st February 2018, 5:31 am

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first. # 20th February 2018, 2:39 pm

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬ # 20th February 2018, 12:30 am

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface. # 2nd February 2018, 6:19 pm

Using in Your (Django) Project. Includes this neat trick: if you list in the setup(scripts=) argument you can call it from e.g. cron using the full path to within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first. # 2nd February 2018, 12:33 pm

Datasette Demo (video) from the SF Python Meetup

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

[... 63 words]

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj. # 17th January 2018, 8:57 pm

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017. # 13th January 2018, 7:40 pm

Let your code type-hint itself: introducing open source MonkeyType. Instagram have open sourced their tool for automatically adding type annotations to your Python 3 code via runtime tracing. By default it logs the types it sees to a SQLite database, which means you can browse them with Datasette! # 15th December 2017, 2:22 am

Python 3 Readiness (via) 345 of the 360 most popular Python packages are now compatible with Python 3. I’d love to see a version of this graph over time. # 2nd December 2017, 11:13 pm

Object models (via) Extremely comprehensive and readable discussion of the object models of Python, JavaScript, Lua and Perl 5. I learned something new about every one of those languages. # 29th November 2017, 2:59 pm

pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick. # 14th November 2017, 9:42 pm

Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib. # 10th November 2017, 3:34 pm