1,070 items tagged “python”
The Python programming language.
2024
time-machine example test for a segfault in Python (via) Here's a really neat testing trick by Adam Johnson. Someone reported a segfault bug in his time-machine library. How you you write a unit test that exercises a segfault without crashing the entire test suite?
Adam's solution is a test that does this:
subprocess.run([sys.executable, "-c", code_that_crashes_python], check=True)
sys.executable
is the path to the current Python executable - ensuring the code will run in the same virtual environment as the test suite itself. The -c
option can be used to have it run a (multi-line) string of Python code, and check=True
causes the subprocess.run()
function to raise an error if the subprocess fails to execute cleanly and returns an error code.
I'm absolutely going to be borrowing this pattern next time I need to add tests to cover a crashing bug in one of my projects.
Talking about Django’s history and future on Django Chat (via) Django co-creator Jacob Kaplan-Moss sat down with the Django Chat podcast team to talk about Django’s history, his recent return to the Django Software Foundation board and what he hopes to achieve there.
Here’s his post about it, where he used Whisper and Claude to extract some of his own highlights from the conversation.
Every dunder method in Python. Trey Hunner: "Python includes 103 'normal' dunder methods, 12 library-specific dunder methods, and at least 52 other dunder attributes of various types."
This cheat sheet doubles as a tour of many of the more obscure corners of the Python language and standard library.
I did not know that Python has over 100 dunder methods now! Quite a few of these were new to me, like __class_getitem__
which can be used to implement type annotations such as list[int]
.
DiskCache (via) Grant Jenks built DiskCache as an alternative caching backend for Django (also usable without Django), using a SQLite database on disk. The performance numbers are impressive—it even beats memcached in microbenchmarks, due to avoiding the need to access the network.
The source code (particularly in core.py) is a great case-study in SQLite performance optimization, after five years of iteration on making it all run as fast as possible.
pywebview 5 (via) pywebview is a library for building desktop (and now Android) applications using Python, based on the idea of displaying windows that use the system default browser to display an interface to the user—styled such that the fact they run on HTML, CSS and JavaScript is mostly hidden from the end-user.
It’s a bit like a much simpler version of Electron. Unlike Electron it doesn’t bundle a full browser engine (Electron bundles Chromium), which reduces the size of the dependency a lot but does mean that cross-browser differences (quite rare these days) do come back into play.
I tried out their getting started example and it’s very pleasant to use—import webview, create a window and then start the application loop running to display it.
You can register JavaScript functions that call back to Python, and you can execute JavaScript in a window from your Python code.
gh-116167: Allow disabling the GIL with PYTHON_GIL=0 or -X gil=0. Merged into python:main 14 hours ago. Looks like the first phase of Sam Gross’s phenomenal effort to provide a GIL free Python (here via an explicit opt-in) will ship in Python 3.13.
llm-claude-3. I built a new plugin for LLM—my command-line tool and Python library for interacting with Large Language Models—which adds support for the new Claude 3 models from Anthropic.
The Zen of Python, Unix, and LLMs. Here’s the YouTube recording of my 1.5 hour conversation with Hugo Bowne-Anderson yesterday.
I fed a Whisper transcript to Google Gemini Pro 1.5 and asked it for the themes from our conversation, and it said we talked about “Python’s success and versatility, the rise and potential of LLMs, data sharing and ethics in the age of LLMs, Unix philosophy and its influence on software development and the future of programming and human-computer interaction”.
aiolimiter. I found myself wanting an asyncio rate limiter for Python today—so I could send POSTs to an API endpoint no more than once every 10 seconds. This library worked out really well—it has a very neat design and lets you set up rate limits for things like “no more than 50 items every 10 seconds”, implemented using the leaky bucket algorithm.
datasette-studio. I've been thinking for a while that it might be interesting to have a version of Datasette that comes bundled with a set of useful plugins, aimed at expanding Datasette's default functionality to cover things like importing data and editing schemas.
This morning I built the very first experimental preview of what that could look like. Install it using pipx
:
pipx install datasette-studio
I recommend pipx because it will ensure datasette-studio
gets its own isolated environment, independent of any other Datasette installations you might have.
Now running datasette-studio
instead of datasette
will get you the version with the bundled plugins.
The implementation of this is fun - it's a single pyproject.toml file defining the dependencies and setting up the datasette-studio
CLI hook, which is enough to provide the full set of functionality.
Is this a good idea? I don't know yet, but it's certainly an interesting initial experiment.
wddbfs – Mount a sqlite database as a filesystem. Ingenious hack from Adam Obeng. Install this Python tool and run it against a SQLite database:
wddbfs --anonymous --db-path path/to/content.db
Then tell the macOS Finder to connect to Go -> Connect to Server -> http://127.0.0.1:8080/
(connect as guest) - connecting via WebDAV.
/Volumes/127.0.0.1/content.db
will now be a folder full of CSV, TSV, JSON and JSONL files - one of each format for every table.
This means you can open data from SQLite directly in any application that supports that format, and you can even run CLI commands such as grep, ripgrep or jq directly against the data!
Adam used WebDAV because "Despite how clunky it is, this seems to be the best way to implement a filesystem given that getting FUSE support is not straightforward". What a neat trick.
uv: Python packaging in Rust (via) "uv is an extremely fast Python package installer and resolver, written in Rust, and designed as a drop-in replacement for pip and pip-tools workflows."
From Charlie Marsh and Astral, the team behind Ruff, who describe it as a milestone in their pursuit of a "Cargo for Python".
Also in this announcement: Astral are taking over stewardship of Armin Ronacher's Rye packaging tool, another Rust project.
uv
is reported to be 8-10x faster than regular pip
, increasing to 80-115x faster with a warm global module cache thanks to copy-on-write and hard links on supported filesystems - which saves on disk space too.
It also has a --resolution=lowest
option for installing the lowest available version of dependencies - extremely useful for testing, I've been wanting this for my own projects for a while.
Also included: uv venv
- a fast tool for creating new virtual environments with no dependency on Python itself.
Python Development on macOS Notes: pyenv and pyenv-virtualenvwrapper (via) Jeff Triplett shares the recipe he uses for working with pyenv (initially installed via Homebrew) on macOS.
I really need to start habitually using this. The benefit of pyenv over Homebrew’s default Python is that pyenv managed Python versions are forever—your projects won’t suddenly stop working in the future when Homebrew changes its default Python version.
Rye: Added support for marking virtualenvs ignored for cloud sync (via) A neat feature in the new Rye 0.22.0 release. It works by using an xattr Rust crate to set the attributes “com.dropbox.ignored” and “com.apple.fileprovider.ignore#P” on the folder.
Rye lets you get from no Python on a computer to a fully functioning Python project in under a minute with linting, formatting and everything in place.
[...] Because it was demonstrably designed to avoid interference with any pre-existing Python configurations, Rye allows for a smooth and gradual integration and the emotional barrier of picking it up even for people who use other tools was shown to be low.
unstructured. Relatively new but impressively capable Python library (Apache 2 licensed) for extracting information from unstructured documents, such as PDFs, images, Word documents and many other formats.
I got some good initial results against a PDF by running “pip install ’unstructured[pdf]’” and then using the “unstructured.partition.pdf.partition_pdf(filename)” function.
There are a lot of moving parts under the hood: pytesseract, OpenCV, various PDF libraries, even an ONNX model—but it installed cleanly for me on macOS and worked out of the box.
snoop. Neat Python debugging utility by Alex Hall: snoop lets you “import snoop” and then add “@snoop” as a decorator to any function, which causes that function’s source code to be output directly to the console with details of any variable state changes that occur while it’s running.
I didn’t know you could make a Python module callable like that—turns out it’s running “sys.modules[’snoop’] = snoop” in the __init__.py module!
Beej’s Guide to Networking Concepts (via) Beej’s Guide to Network Programming is a legendary tutorial on network programming in C, continually authored and updated by Brian “Beej” Hall since 1995.
This is NOT that. Beej’s Guide to Networking Concepts is brand new—started in March 2023—and illustrates a whole bunch of networking concepts using Python instead of C.
From the forward: “Is it Beej’s Guide to Network Programming in Python? Well, kinda, actually. The C book is more about how C’s (well, Unix’s) network API works. And this book is more about the concepts underlying it, using Python as a vehicle.”
urllib3 2.2.0. Highlighted feature: “urllib3 now works in the browser”—the core urllib3 library now includes code that can integrate with Pyodide, using the browser’s fetch() or XMLHttpRequest APIs to make HTTP requests (to CORS-enabled endpoints).
Getting Started With CUDA for Python Programmers (via) if, like me, you’ve avoided CUDA programming (writing efficient code that runs on NVIGIA GPUs) in the past, Jeremy Howard has a new 1hr17m video tutorial that demystifies the basics. The code is all run using PyTorch in notebooks running on Google Colab, and it starts with a very clear demonstration of how to convert a RGB image to black and white.
Find a level of abstraction that works for what you need to do. When you have trouble there, look beneath that abstraction. You won’t be seeing how things really work, you’ll be seeing a lower-level abstraction that could be helpful. Sometimes what you need will be an abstraction one level up. Is your Python loop too slow? Perhaps you need a C loop. Or perhaps you need numpy array operations.
You (probably) don’t need to learn C.
Python packaging must be getting better—a datapoint (via) Luke Plant reports on a recent project he developed on Linux using a requirements.txt file and some complex binary dependencies—Qt5 and VTK—and when he tried to run it on Windows... it worked! No modifications required.
I think Python’s packaging system has never been more effective... provided you know how to use it. The learning curve is still too high, which I think accounts for the bulk of complaints about it today.
Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions
I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.
[... 686 words]Marimo (via) This is a really interesting new twist on Python notebooks.
The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.
There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors.
The Random Transformer (via) “Understand how transformers work by demystifying all the math behind them”—Omar Sanseviero from Hugging Face meticulously implements the transformer architecture behind LLMs from scratch using Python and numpy. There’s a lot to take in here but it’s all very clearly explained.
Python 3.13 gets a JIT. “In late December 2023 (Christmas Day to be precise), CPython core developer Brandt Bucher submitted a little pull-request to the Python 3.13 branch adding a JIT compiler.”
Anthony Shaw does a deep dive into this new experimental JIT, explaining how it differs from other JITs. It’s an implementation of a copy-and-patch JIT, an idea that only emerged in 2021. This makes it architecturally much simpler than a traditional JIT, allowing it to compile faster and take advantage of existing LLVM tools on different architectures.
So far it’s providing a 2-9% performance improvement, but the real impact will be from the many future optimizations it enables.
Fastest Way to Read Excel in Python (via) Haki Benita produced a meticulously researched and written exploration of the options for reading a large Excel spreadsheet into Python. He explored Pandas, Tablib, Openpyxl, shelling out to LibreOffice, DuckDB and python-calamine (a Python wrapper of a Rust library). Calamine was the winner, taking 3.58s to read 500,00 rows—compared to Pandas in last place at 32.98s.
2023
iSH: The Linux shell for iOS (via) Installing this iOS app gives you a full Linux shell environment running on your phone, using a “usermode x86 emulator”. You can even install packages: “apk add python3” gave me a working Python 3.9 interpreter, installed from the apk.ish.app repository.
I didn’t think this kind of thing was allowed by the App Store, but that’s not been the case for a few years now: Section 4.5.2 of the App Store guidelines clarifies that “Educational apps designed to teach, develop, or allow students to test executable code may, in limited circumstances, download code provided that such code is not used for other purposes.”
ast-grep (via) There are a lot of interesting things about this year-old project.
sg (an alias for ast-grep) is a CLI tool for running AST-based searches against code, built in Rust on top of the Tree-sitter parsing library. You can run commands like this:
sg -p ’await await_me_maybe($ARG)’ datasette --lang python
To search the datasette directory for code that matches the search pattern, in a syntax-aware way.
It works across 19 different languages, and can handle search-and-replace too, so it can work as a powerful syntax-aware refactoring tool.
My favourite detail is how it’s packaged. You can install the CLI utility using Homebrew, Cargo, npm or pip/pipx—each of which will give you a CLI tool you can start running. On top of that it provides API bindings for Rust, JavaScript and Python!
It’s Time For A Change: datetime.utcnow() Is Now Deprecated (via) Miguel Grinberg explains the deprecation of datetime.utcnow() and utcfromtimestamp() in Python 3.12, since they return naive datetime objects which cause all sorts of follow-on problems.
The replacement idiom is datetime.datetime.now(datetime.timezone.utc)