Simon Willison’s Weblog


Bundling binary tools in Python wheels

23rd May 2022

I spotted a new (to me) pattern which I think is pretty interesting: projects are bundling compiled binary applications as part of their Python packaging wheels. I think it’s really neat.

pip install ziglang

Zig is a new programming language lead by Andrew Kelley that sits somewhere near Rust: Wikipedia calls it an “imperative, general-purpose, statically typed, compiled system programming language”.

One of its most notable features is that it bundles its own C/C++ compiler, as a “hermetic” compiler—it’s completely standalone, unaffected by the system that it is operating within. I learned about this usage of the word hermetic this morning from How Uber Uses Zig by Motiejus Jakštys.

The concept reminds me of Gregory Szorc’s python-build-standalone, which provides redistributable Python builds and was key to getting my Datasette Desktop Electron application working with its own hermetic build of Python.

One of the options provided for installing Zig (and its bundled toolchain) is to use pip:

% pip install ziglang
% python -m ziglang cc --help
OVERVIEW: clang LLVM compiler

USAGE: zig [options] file...

  -###                    Print (but do not run) the commands to run for this compilation
                          Tool used for detecting AMD GPU arch in the system.

This means you can now pip install a full C compiler for your current platform!

The way this works is really simple. The ziglang package that you install has two key files: A zig binary (155MB on my system) containing the full Zig compiled implementation, and a module containing the following:

import os, sys, subprocess
    os.path.join(os.path.dirname(__file__), "zig"),

The package also bundles lib and doc folders with supporting files used by Zig itself, unrelated to Python.

The Zig project then bundles and ships eight different Python wheels targetting different platforms. Here’s their code that does that, which lists the platforms that are supported:

for zig_platform, python_platform in {
    'windows-i386':   'win32',
    'windows-x86_64': 'win_amd64',
    'macos-x86_64':   'macosx_10_9_x86_64',
    'macos-aarch64':  'macosx_11_0_arm64',
    'linux-i386':     'manylinux_2_12_i686.manylinux2010_i686',
    'linux-x86_64':   'manylinux_2_12_x86_64.manylinux2010_x86_64',
    'linux-armv7a':   'manylinux_2_17_armv7l.manylinux2014_armv7l',
    'linux-aarch64':  'manylinux_2_17_aarch64.manylinux2014_aarch64',
    # Build the wheel here...

They suggest that if you want to run their tools from a Python program you do so like this, to ensure your script can find the installed binary:

import sys, subprocess[sys.executable, "-m", "ziglang"])

I find this whole approach pretty fascinating. I really love the idea that I can add a full C/C++ compiler as a dependency to any of my Python projects, and thanks to Python wheels I’ll automatically get a binary excutable compiled for my current platform.

Playwright Python

I spotted another example of this pattern recently in Playwright Python. Playwright is Microsoft’s open source browser automation and testing framework—a kind of modern Selenium. I used it recently to build my shot-scraper screenshot automation tool.

Playwright provides a full-featured API for controlling headless (and headful) browser instances, with implementations in Node.js, Python, Java and .NET.

I was intrigued as to how they had developed such a sophisticated API for four different platforms/languages at once, providing full equivalence for all of their features across all four.

So I dug around in their Python package (from pip install playwright) and found this:

77M ./venv/lib/python3.10/site-packages/playwright/driver/node

That’s a full copy of the Node.js binary!

% ./venv/lib/python3.10/site-packages/playwright/driver/node --version

Playwright Python works by providing a Python layer on top of the existing JavaScript API library. It runs a Node.js process which does the actual work, the Python library just communicates with the JavaScript for you.

As with Zig, the Playwright team offer seven pre-compiled wheels for different platforms. The list today is:

  • playwright-1.22.0-py3-none-win_amd64.whl
  • playwright-1.22.0-py3-none-win32.whl
  • playwright-1.22.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • playwright-1.22.0-py3-none-manylinux1_x86_64.whl
  • playwright-1.22.0-py3-none-macosx_11_0_universal2.whl
  • playwright-1.22.0-py3-none-macosx_11_0_arm64.whl
  • playwright-1.22.0-py3-none-macosx_10_13_x86_64.whl

I wish I could say "you can now pip install a browser!" but Playwright doesn’t actually bundle the browsers themselves—you need to run python -m playwright install to download those separately.

Pretty fascinating example of the same pattern though!

pip install a SQLite database

It’s not quite the same thing, since it’s not packaging an executable, but the one project I have that fits this mould if you squint a little is my datasette-basemap plugin.

It’s a Datasette plugin which bundles a 23MB SQLite database file containing OpenStreetMap tiles for the first seven zoom levels of their world map—5,461 tile images total.

I built it so that people could use my datasette-cluster-map and datasette-leaflet-geojson entirely standalone, without needing to load tiles from a central tile server.

You can play with a demo here. I wrote more about that project in Serving map tiles from SQLite with MBTiles and datasette-tiles. It’s pretty fun to be able to run pip install datasette-basemap to install a full map of the world.

Seen any other interesting examples of pip install being (ab)used in this way? Ping them to me on Twitter.

Update: Paul O’Leary McCann points out that PyPI has a default 60MB size limit for packages, though it can be raised on a case-by-case basis. He wrote about this in Distributing Large Files with PyPI Packages.