Simon Willison’s Weblog

766 items tagged “python”


Annotated explanation of David Beazley’s dataklasses (via) David Beazley released a self-described “deliciously evil spin on dataclasses” that uses some deep Python trickery to implement a dataclass style decorator which creates classes that import 15-20 times faster than the original. I put together a heavily annotated version of his code while trying to figure out how all of the different Python tricks in it work. # 20th December 2021, 5:05 am

TypeScript for Pythonistas (via) Really useful explanation of how TypeScript differs from Python with mypy. I hadn’t realized TypeScript leans so far into structural typing, to the point that two types with different names but the same “shape” are identified as being the same type as each other. # 17th December 2021, 7:43 pm

wheel.yml for Pyjion using cibuildwheel (via) cibuildwheel, maintained by the Python Packaging Authority, builds and tests Python wheels across multiple platforms. I hadn’t realized quite how minimal a configuration using their GitHub Actions action was until I looked at how Pyjion was using it. # 10th December 2021, 3:05 am

Introducing stack graphs (via) GitHub launched “precise code navigation” for Python today—the first language to get support for this feature. Click on any Python symbol in GitHub’s code browsing views and a box will show you exactly where that symbol was defined—all based on static analysis by a custom parser written in Rust as opposed to executing any Python code directly. The underlying computer science uses a technique called stack graphs, based on scope graphs research from Eelco Visser’s research group at TU Delft. # 9th December 2021, 11:07 pm

An oral history of Bank Python (via) Fascinating description of a very custom Python environment inside a large investment bank—where all of the code lives inside the Python environment itself, everything can be imported into the same process and a directed acyclic graph engine implements Excel-style reactive dependencies. Plenty of extra flavour from people who’ve worked with this (and related) Python systems in the Hacker News comments. # 5th November 2021, 5:18 am

How to build, test and publish an open source Python library

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

[... 2017 words]

s3-credentials: a tool for creating credentials for S3 buckets

I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.

[... 1593 words]

Why you shouldn’t invoke directly (via) Paul Ganssle explains why you shouldn’t use “python command” any more. I’ve mostly switched to pip and pytest and twine but I was still using “python sdist”—apparently the new replacement recipe for that is “python -m build”. # 19th October 2021, 5:22 pm

Where does all the effort go? Looking at Python core developer activity (via) Łukasz Langa used Datasette to explore 28,780 pull requests made to the CPython GitHub repository, using some custom Python scripts (and sqlite-utils) to load in the data. # 18th October 2021, 8:21 pm

Tests aren’t enough: Case study after adding type hints to urllib3. Very thorough write-up by Seth Michael Larson describing what it took for the urllib3 Python library to fully embrace mypy and optional typing and what they learned along the way. # 18th October 2021, 7:03 pm

Web Browser Engineering (via) In progress free online book by Pavel Panchekha and Chris Harrelson that demonstrates how a web browser works by writing one from scratch using Python, tkinter and the DukPy wrapper around the Duktape JavaScript interpreter. # 17th October 2021, 3:53 pm

Finding and reporting an asyncio bug in Python 3.10

I found a bug in Python 3.10 today! Some notes on how I found it and my process for handling it once I figured out what was going on.

[... 1789 words]

The GIL and its effects on Python multithreading (via) Victor Skvortsov presents the most in-depth explanation of the Python Global Interpreter Lock I’ve seen anywhere. I learned a ton from reading this. # 29th September 2021, 5:23 pm

django-upgrade (via) Adam Johnson’s new CLI tool for upgrading Django projects by automatically applying changes to counter deprecations made in different versions of the framework. Uses the Python standard library tokenize module which gives it really quick performance in parsing and rewriting Python code. Exciting to see this kind of codemod approach becoming more common in Python world—JavaScript developers use this kind of thing a lot. # 26th September 2021, 5:42 am

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database. # 24th August 2021, 11:16 pm

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks. # 11th August 2021, 1:03 am

How the Python import system works (via) Remarkably detailed and thorough dissection of how exactly import, modules and packages work in Python—eventually digging right down into the C code. Part of Victor Skvortsov’s excellent “Python behind the scenes” series. # 24th July 2021, 8:12 pm

I’ve always believed that a book, even a technical book, should try to tell a cohesive story. The challenge is that as Python has grown in popularity, it has really turned into three different languages--each with their own story. There is a whimsical Python for scripting and tinkering, a quirky Python for asynchronous programming and networking, and a serious Python for enterprise applications. Sometimes these stories intersect. Sometimes not.

David Beazley # 18th July 2021, 2:53 pm

Group thousands of similar spreadsheet text cells in seconds (via) Luke Whyte explains how to efficiently group similar text columns in a table (Walmart and Wal-mart for example) using a clever combination of TF/IDF, sparse matrices and cosine similarity. Includes the clearest explanation of cosine similarity for text I’ve seen—and Luke wrote a Python library, textpack, that implements the described pattern. # 27th June 2021, 4:24 pm

Querying Parquet using DuckDB (via) DuckDB is a relatively new SQLite-style database (released as an embeddable library) with a focus on analytical queries. This tutorial really made the benefits click for me: it ships with support for the Parquet columnar data format, and you can use it to execute SQL queries directly against Parquet files—e.g. “SELECT COUNT(*) FROM ’taxi_2019_04.parquet’”. Performance against large files is fantastic, and the whole thing can be installed just using “pip install duckdb”. I wonder if faceting-style group/count queries (pretty expensive with regular RDBMSs) could be sped up with this? # 25th June 2021, 10:40 pm

Powering the Python Package Index in 2021. PyPI now serves “nearly 900 terabytes over more than 2 billion requests per day”. Bandwidth is donated by Fastly, a value estimated at 1.8 million dollars per month! Lots more detail about how PyPI has evolved over the past years in this post by Dustin Ingram. # 14th May 2021, 4:50 am

Async functions require an event loop to run. Flask, as a WSGI application, uses one worker to handle one request/response cycle. When a request comes in to an async view, Flask will start an event loop in a thread, run the view function there, then return the result. Each request still ties up one worker, even for async views. The upside is that you can run async code within a view, for example to make multiple concurrent database queries, HTTP requests to an external API, etc. However, the number of requests your application can handle at one time will remain the same.

Using async and await in Flask 2.0 # 12th May 2021, 5:59 pm

New Major Versions Released! Flask 2.0, Werkzeug 2.0, Jinja 3.0, Click 8.0, ItsDangerous 2.0, and MarkupSafe 2.0. Huge set of releases from the Pallets team. Python 3.6+ required and comprehensive type annotations. Flask now supports async views, Jinja async templates (used extensively by Datasette) “no longer requires patching”, Click has a bunch of new code around shell tab completion, ItsDangerous supports key rotation and so much more. # 12th May 2021, 5:37 pm

cinder: Instagram’s performance oriented fork of CPython (via) Instagram forked CPython to add some performance-oriented features they wanted, including a method-at-a-time JIT compiler and a mechanism for eagerly evaluating coroutines (avoiding the overhead of creating a coroutine if an awaited function returns a value without itself needing to await). They’re open sourcing the code to help start conversations about implementing some of these features in CPython itself. I particularly enjoyed the warning that accompanies the repo: this is not intended to be a supported release, and if you decide to run it in production you are on your own! # 4th May 2021, 10:13 pm

Hello, HPy (via) HPy provides a new way to write C extensions for Python in a way that is compatible with multiple Python implementations at once, including PyPy. # 29th March 2021, 2:40 pm

Homebrew Python Is Not For You. If you’ve been running into frustrations with your Homebrew Python environments breaking over the past few months (the dreaded “Reason: image not found” error) Justin Mayer has a good explanation. Python in a Homebrew is designed to work as a dependency for their other packages, and recent policy changes that they made to support smoother upgrades have had catastrophic problems effects on those of us who try to use it for development environments. # 25th March 2021, 3:14 pm

When you have to mock a collaborator, avoid using the Mock object directly. Either use mock.create_autospec() or mock.patch(autospec=True) if at all possible. Autospeccing from the real collaborator means that if the collaborator’s interface changes, your tests will fail. Manually speccing or not speccing at all means that changes in the collaborator’s interface will not break your tests that use the collaborator: you could have 100% test coverage and your library would fall over when used!

Thea Flowers # 17th March 2021, 4:44 pm

sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions. # 15th March 2021, 2:55 am

sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection. # 15th March 2021, 2:52 am

unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm