Simon Willison’s Weblog

Subscribe

Items tagged python in Feb, 2024

Filters: Year: 2024 × Month: Feb × python × Sorted by date


The Zen of Python, Unix, and LLMs. Here’s the YouTube recording of my 1.5 hour conversation with Hugo Bowne-Anderson yesterday.

I fed a Whisper transcript to Google Gemini Pro 1.5 and asked it for the themes from our conversation, and it said we talked about “Python’s success and versatility, the rise and potential of LLMs, data sharing and ethics in the age of LLMs, Unix philosophy and its influence on software development and the future of programming and human-computer interaction”. # 29th February 2024, 9:04 pm

aiolimiter. I found myself wanting an asyncio rate limiter for Python today—so I could send POSTs to an API endpoint no more than once every 10 seconds. This library worked out really well—it has a very neat design and lets you set up rate limits for things like “no more than 50 items every 10 seconds”, implemented using the leaky bucket algorithm. # 20th February 2024, 1:15 am

datasette-studio. I’ve been thinking for a while that it might be interesting to have a version of Datasette that comes bundled with a set of useful plugins, aimed at expanding Datasette’s default functionality to cover things like importing data and editing schemas.

This morning I built the very first experimental preview of what that could look like. Install it using pipx:

pipx install datasette-studio

I recommend pipx because it will ensure datasette-studio gets its own isolated environment, independent of any other Datasette installations you might have.

Now running “datasette-studio” instead of “datasette” will get you the version with the bundled plugins.

The implementation of this is fun—it’s a single pyproject.toml file defining the dependencies and setting up the datasette-studio CLI hook, which is enough to provide the full set of functionality.

Is this a good idea? I don’t know yet, but it’s certainly an interesting initial experiment. # 18th February 2024, 8:38 pm

wddbfs – Mount a sqlite database as a filesystem. Ingenious hack from Adam Obeng. Install this Python tool and run it against a SQLite database:

wddbfs --anonymous --db-path path/to/content.db

Then tell the macOS Finder to connect to Go -> Connect to Server -> http://127.0.0.1:8080/ (connect as guest)—connecting via WebDAV.

/Volumes/127.0.0.1/content.db will now be a folder full of CSV, TSV, JSON and JSONL files—one of each format for every table.

This means you can open data from SQLite directly in any application that supports that format, and you can even run CLI commands such as grep, ripgrep or jq directly against the data!

Adam used WebDAV because “Despite how clunky it is, this seems to be the best way to implement a filesystem given that getting FUSE support is not straightforward”. What a neat trick. # 18th February 2024, 3:31 am

uv: Python packaging in Rust (via) “uv is an extremely fast Python package installer and resolver, written in Rust, and designed as a drop-in replacement for pip and pip-tools workflows.”

From Charlie Marsh and Astral, the team behind Ruff, who describe it as a milestone in their pursuit of a “Cargo for Python”.

Also in this announcement: Astral are taking over stewardship of Armin Ronacher’s Rye packaging tool, another Rust project.

uv is reported to be 8-10x faster than regular pip, increasing to 80-115x faster with a warm global module cache thanks to copy-on-write and hard links on supported filesystems—which saves on disk space too.

It also has a --resolution=lowest option for installing the lowest available version of dependencies—extremely useful for testing, I’ve been wanting this for my own projects for a while.

Also included: “uv venv”—a fast tool for creating new virtual environments with no dependency on Python itself. # 15th February 2024, 7:57 pm

Python Development on macOS Notes: pyenv and pyenv-virtualenvwrapper (via) Jeff Triplett shares the recipe he uses for working with pyenv (initially installed via Homebrew) on macOS.

I really need to start habitually using this. The benefit of pyenv over Homebrew’s default Python is that pyenv managed Python versions are forever—your projects won’t suddenly stop working in the future when Homebrew changes its default Python version. # 11th February 2024, 4:41 am

Rye: Added support for marking virtualenvs ignored for cloud sync (via) A neat feature in the new Rye 0.22.0 release. It works by using an xattr Rust crate to set the attributes “com.dropbox.ignored” and “com.apple.fileprovider.ignore#P” on the folder. # 10th February 2024, 6:50 am

Rye lets you get from no Python on a computer to a fully functioning Python project in under a minute with linting, formatting and everything in place.

[...] Because it was demonstrably designed to avoid interference with any pre-existing Python configurations, Rye allows for a smooth and gradual integration and the emotional barrier of picking it up even for people who use other tools was shown to be low.

Armin Ronacher # 4th February 2024, 3:12 pm

unstructured. Relatively new but impressively capable Python library (Apache 2 licensed) for extracting information from unstructured documents, such as PDFs, images, Word documents and many other formats.

I got some good initial results against a PDF by running “pip install ’unstructured[pdf]’” and then using the “unstructured.partition.pdf.partition_pdf(filename)” function.

There are a lot of moving parts under the hood: pytesseract, OpenCV, various PDF libraries, even an ONNX model—but it installed cleanly for me on macOS and worked out of the box. # 2nd February 2024, 2:47 am