Simon Willison’s Weblog


864 items tagged “python”


A Hackers’ Guide to Language Models. Jeremy Howard’s new 1.5 hour YouTube introduction to language models looks like a really useful place to catch up if you’re an experienced Python programmer looking to start experimenting with LLMs. He covers what they are and how they work, then shows how to build against the OpenAI API, build a Code Interpreter clone using OpenAI functions, run models from Hugging Face on your own machine (with NVIDIA cards or on a Mac) and finishes with a demo of fine-tuning a Llama 2 model to perform text-to-SQL using an open dataset. # 25th September 2023, 12:24 am

How CPython Implements and Uses Bloom Filters for String Processing. Fascinating dive into Python string internals by Abhinav Upadhyay. It turns out CPython uses very simple bloom filters in several parts of the core string methods, to solve problems like splitting on newlines where there are actually eight codepoints that could represent a newline, and a tiny bloom filter can help filter a character in a single operation before performing all eight comparisons only if that first check failed. # 16th September 2023, 10:32 pm

bpy—Blender on PyPI (via) TIL you can “pip install” Blender!

bpy “provides Blender as a Python module”—it’s part of the official Blender project, and ships with binary wheels ranging in size from 168MB to 319MB depending on your platform.

It only supports the version of Python used by the current Blender release though—right now that’s Python 3.10. # 8th September 2023, 3:29 pm

Understanding Immortal Objects in Python 3.12. Abhinav Upadhyay provides a clear and detailed explanation of immortal objects coming in Python 3.12, which ensure Python no longer updates reference counts for immutable objects such as True, False, None and low-values integers. The trick (which maintains ABI compatibility) is pretty simple: a reference count value of 4294967295 now means an object is immortal, and the existing Py_INCREF and Py_DECREF macros have been updated to take that into account. # 26th August 2023, 12:08 pm

Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.

The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.

It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example. # 13th August 2023, 9:50 pm

deno_python (via) A wildly impressive hack: deno_python uses Deno’s FFI interface to load your system’s Python framework (.dll/.dylib/.so) and sets up JavaScript proxy objects for imported Python objects—so you can run JavaScript code that instantiates objects from Python libraries and uses them to process data in different ways.

The latest release added pip support, so things like ’const np = await pip.import(“numpy”)’ now work. # 12th August 2023, 10:14 pm

Llama from scratch (or how to implement a paper without crying) (via) Brian Kitano implemented the model described in the Llama paper against TinyShakespeare, from scratch, using Python and PyTorch. This write-up is fantastic—meticulous, detailed and deeply informative. It would take several hours to fully absorb and follow everything Brian does here but it would provide multiple valuable lessons in understanding how all of this stuff fits together. # 9th August 2023, 7:21 pm

Python cocktail: mix a context manager and an iterator in equal parts (via) Explanation of a neat trick used by the Tenacity Python library, which provides a mechanism for retrying a chunk of code automatically on errors up to three times using a mixture of an iterator and a context manager to work around Python’s lack of multi-line lambda functions. # 6th August 2023, 5:44 pm

Catching up on the weird world of LLMs

I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2.

[... 10475 words]

A Steering Council notice about PEP 703 (Making the Global Interpreter Lock Optional in CPython) (via) Huge news concerning the nogil research fork of Python: “It’s clear that the overall sentiment is positive, both for the general idea and for PEP 703 specifically. The Steering Council is also largely positive on both. We intend to accept PEP 703, although we’re still working on the acceptance details.” # 29th July 2023, 9:23 pm

textual-paint (via) Isaiah Odhner built a full working clone of MS Paint that runs entirely in the terminal, using Textual and Python. It’s worth trying this out to get a feel for quite how much you can get done with a CLI app these days—“pipx install textual-paint” worked for me. # 25th July 2023, 2:32 pm

Stamina: tutorial (via) Stamina is Hynek’s new Python library that implements an opinionated wrapper on top of Tenacity, providing a decorator for easily implementing exponential backoff retries. This tutorial includes a concise, clear explanation as to why this is such an important concept in building distributed systems. # 4th July 2023, 8:13 pm

Data analysis with SQLite and Python. I turned my 2hr45m workshop from PyCon into the latest official tutorial on the Datasette website. It includes an extensive handout which should be useful independently of the video itself. # 2nd July 2023, 4:48 pm

Status of Python Versions (via) Very clear and useful page showing the exact status of different Python versions. 3.7 reaches end of life today (no more security updates), while 3.11 will continue to be supported until October 2027. # 27th June 2023, 2:01 pm

Building Search DSLs with Django (via) Neat tutorial by Dan Lamanna: how to build a GitHub-style search feature—supporting modifiers like “is:open author:danlamanna”—using PyParsing and the Django ORM. # 19th June 2023, 8:30 am

Symbex: search Python code for functions and classes, then pipe them into a LLM

I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.

[... 1183 words] Python’s sqlite3 with extensions. Anton Zhiyanov built a new Python package which bundles a fresh, compiled copy of SQLite with his SQLean family of C extensions built right in. Installing it gets you the latest SQLite—3.42.0—with nearly 200 additional functions, including things like define() and eval(), fileio_read() and fileio_write(), percentile_95() and uuid4() and many more. “import sqlean as sqlite3” works as a drop-in replacement for the module from the standard library. # 17th June 2023, 10:42 pm

simpleaichat (via) Max Woolf released his own Python package for building against the GPT-3.5 and GPT-4 APIs (and potentially other LLMs in the future).

It’s a very clean piece of API design with some useful additional features: there’s an AsyncAIChat subclass that works with Python asyncio, and the library includes a mechanism for registering custom functions that can then be called by the LLM as tools.

One trick I haven’t seen before: it uses a combination of max_tokens: 1 and a ChatGPT logit_bias to ensure that answers to one of its default prompts are restricted to just numerals between 0 and 9. This is described in the file. # 8th June 2023, 9:06 pm

pytest-icdiff (via) This is neat: “pip install pytest-icdiff” provides an instant usability upgrade to the output of failed tests in pytest, especially if the assertions involve comparing larger strings or nested JSON objects. # 3rd June 2023, 4:59 pm

The Python Language Summit 2023: Making the Global Interpreter Lock Optional. Extremely informative update covering Sam Gross’s python-nogil proposal from this year’s language summit at PyCon.

Sam has been working hard on his fork for the past year, and now has it rebased for Python 3.12. If his PEP is accepted it could end up as an optional compile-time build in time for Python 3.13.

“The plan for nogil remains that it would be enabled via a compile-time flag, named --disable-gil. Third-party C extensions would need to provide separate wheels for GIL-disabled Python.” # 31st May 2023, 12:04 am

Trogon (via) The latest project from the Textualize/Rich crew, Trogon provides a Python decorator—@tui—which, when applied to a Click CLI application, adds a new interactive TUI mode which introspects the available subcommands and their options and creates a full Text User Interface—with keyboard and mouse support—for assembling invocations of those various commands.

I just shipped sqlite-utils 3.32 with support for this—it uses an optional dependency, so you’ll need to run “sqlite-utils install trogon” and then “sqlite-utils tui” to try it out. # 21st May 2023, 9:39 pm

Writing Python like it’s Rust (via) Fascinating article by Jakub Beránek describing in detail patterns for using type annotations in Python inspired by working in Rust. I learned new tricks about both languages from reading this. # 21st May 2023, 12:18 am

Real Multithreading is Coming to Python—Learn How You Can Use It Now (via) Martin Heinz provides a detailed tutorial on trying out the new Per-Interpreter GIL feature that’s landing in Python 3.12, which allows Python code to run concurrently in multiple threads by spawning separate sub-interpreters, each with their own dedicated GIL.

It’s not an easy feature to play with yet! First you need to compile Python yourself, and then use APIs that are generally only available to C code (but should hopefully become available to Python code itself in Python 3.13).

Martin’s workaround for this is ingenious: it turns out the Python package provides utility functions to help write tests against interpreters, and Martin shows how to abuse this module to launch, run and cleanup interpreters using regular Python code.

He also demonstrates, which can be used to create channels with receiver and sender ends, somewhat similar to Go. # 15th May 2023, 7:42 pm

Implement DNS in a weekend (via) Fantastically clear and useful guide to implementing DNS lookups, from scratch, using Python’s struct, socket and dataclass modules—Julia Evans plans to follow this up with one for TLS which I am very much looking forward to. # 12th May 2023, 6:14 pm

Mojo may be the biggest programming advance in decades (via) Jeremy Howard makes a very convincing argument for why the new programming language Mojo is a big deal.

Mojo is a superset of Python designed by a team lead by Chris Lattner, who previously created LLVM, Clang and and Swift.

Existing Python code should work unmodified, but it also adds features that enable performant low-level programming—like “fn” for creating typed, compiled functions and “struct” for memory-optimized alternatives to classes.

It’s worth watching Jeremy’s video where he uses these features to get more than a 2000x speed up implementing matrix multiplication, while still keeping the code readable and easy to follow.

Mojo isn’t available yet outside of a playground preview environment, but it does look like an intriguing new project. # 4th May 2023, 4:41 am

urllib3 v2.0.0 is now generally available. urllib3 is 12 years old now, and is a common low-level dependency for packages like requests and httpx. The biggest new feature in v2 is a higher-level API: resp = urllib3.request(“GET”, “”)—a very welcome addition to the library. # 26th April 2023, 10 pm

Rye. Armin Ronacher’s take on a Python packaging tool. There are a lot of interesting ideas in this one—it’s written in Rust, configured using pyproject.toml and has some very strong opinions, including completely hiding pip from view and insisting you use “rye add package” instead. Notably, it doesn’t use the system Python at all: instead, it downloads a pre-compiled standalone Python from Gregory Szorc’s python-build-standalone project—the same approach I used for the Datasette Desktop Electron app.

Armin warns that this is just an exploration, with no guarantees of future maintenance—and even has an issue open titled “Should Rye exist?” # 24th April 2023, 4:02 am

Introducing PyPI Organizations. Launched at PyCon US today: Organizations allow packages on the Python Package Index to be owned by a group, not an individual user account. “We’re making organizations available to community projects for free, forever, and to corporate projects for a small fee.”—this is the first revenue generating PyPI feature. # 23rd April 2023, 8:29 pm

codespaces-jupyter (via) This is really neat. Click “Use this template” -> “Open in a codespace” and you get a full in-browser VS Code interface where you can open existing notebook files (or create new ones) and start playing with them straight away. # 14th April 2023, 10:38 pm

Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha

Today I wanted to understand the performance difference between two Python implementations of a mechanism to detect changes to a SQLite database schema. I rendered the difference between the two as this chart:

[... 2939 words]