1,155 posts tagged “python”
The Python programming language.
2023
New sqlite3 CLI tool in Python 3.12. The newly released Python 3.12 includes a SQLite shell, which you can open using “python -m sqlite3”—handy for when you’re using a machine that has Python installed but no sqlite3 binary.
I installed Python 3.12 for macOS using the official installer from Python.org and now “/usr/local/bin/python3 -m sqlite3” gives me a SQLite 3.41.1 shell—a pleasantly recent version from March 2023 (the latest SQLite is 3.43.1, released in September).
[On Python 3.12 subinterpreters] there's massive advantages for mixed C(++) and Python: I can now have multiple sub interpreters running concurrently and accessing the same shared state in a thread-safe C++ library.
Previously this required rewriting the whole C++ library to support either pickling (multiplying the total memory consumption by the number of cores), or support allocating everything in shared memory (which means normal C++ types like
std::string
are unusable, need to switch e.g. toboost::interprocess
).Now is sufficient to pickle a pointer to a C++ object as an integer, and it'll still be a valid pointer in the other subinterpreter.
— ynik
Get Your Mac Python From Python.org. Glyph recommends the official Python installer from python.org as the best way to get started with a Python environment on macOS—with require-virtualenv = true in your ~/.pip/pip.conf to help avoid accidentally installing global packages.
Things I’ve learned about building CLI tools in Python
I build a lot of command-line tools in Python. It’s become my favorite way of quickly turning a piece of code into something I can use myself and package up for other people to use too.
[... 1,235 words]A Hackers’ Guide to Language Models. Jeremy Howard’s new 1.5 hour YouTube introduction to language models looks like a really useful place to catch up if you’re an experienced Python programmer looking to start experimenting with LLMs. He covers what they are and how they work, then shows how to build against the OpenAI API, build a Code Interpreter clone using OpenAI functions, run models from Hugging Face on your own machine (with NVIDIA cards or on a Mac) and finishes with a demo of fine-tuning a Llama 2 model to perform text-to-SQL using an open dataset.
How CPython Implements and Uses Bloom Filters for String Processing. Fascinating dive into Python string internals by Abhinav Upadhyay. It turns out CPython uses very simple bloom filters in several parts of the core string methods, to solve problems like splitting on newlines where there are actually eight codepoints that could represent a newline, and a tiny bloom filter can help filter a character in a single operation before performing all eight comparisons only if that first check failed.
bpy—Blender on PyPI (via) TIL you can “pip install” Blender!
bpy “provides Blender as a Python module”—it’s part of the official Blender project, and ships with binary wheels ranging in size from 168MB to 319MB depending on your platform.
It only supports the version of Python used by the current Blender release though—right now that’s Python 3.10.
Understanding Immortal Objects in Python 3.12. Abhinav Upadhyay provides a clear and detailed explanation of immortal objects coming in Python 3.12, which ensure Python no longer updates reference counts for immutable objects such as True, False, None and low-values integers. The trick (which maintains ABI compatibility) is pretty simple: a reference count value of 4294967295 now means an object is immortal, and the existing Py_INCREF and Py_DECREF macros have been updated to take that into account.
Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.
The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.
It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example.
deno_python (via) A wildly impressive hack: deno_python uses Deno’s FFI interface to load your system’s Python framework (.dll/.dylib/.so) and sets up JavaScript proxy objects for imported Python objects—so you can run JavaScript code that instantiates objects from Python libraries and uses them to process data in different ways.
The latest release added pip support, so things like ’const np = await pip.import(“numpy”)’ now work.
Llama from scratch (or how to implement a paper without crying) (via) Brian Kitano implemented the model described in the Llama paper against TinyShakespeare, from scratch, using Python and PyTorch. This write-up is fantastic—meticulous, detailed and deeply informative. It would take several hours to fully absorb and follow everything Brian does here but it would provide multiple valuable lessons in understanding how all of this stuff fits together.
Python cocktail: mix a context manager and an iterator in equal parts (via) Explanation of a neat trick used by the Tenacity Python library, which provides a mechanism for retrying a chunk of code automatically on errors up to three times using a mixture of an iterator and a context manager to work around Python’s lack of multi-line lambda functions.
Catching up on the weird world of LLMs
I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2.
[... 10,489 words]A Steering Council notice about PEP 703 (Making the Global Interpreter Lock Optional in CPython) (via) Huge news concerning the nogil research fork of Python: “It’s clear that the overall sentiment is positive, both for the general idea and for PEP 703 specifically. The Steering Council is also largely positive on both. We intend to accept PEP 703, although we’re still working on the acceptance details.”
textual-paint (via) Isaiah Odhner built a full working clone of MS Paint that runs entirely in the terminal, using Textual and Python. It’s worth trying this out to get a feel for quite how much you can get done with a CLI app these days—“pipx install textual-paint” worked for me.
Stamina: tutorial (via) Stamina is Hynek’s new Python library that implements an opinionated wrapper on top of Tenacity, providing a decorator for easily implementing exponential backoff retries. This tutorial includes a concise, clear explanation as to why this is such an important concept in building distributed systems.
Data analysis with SQLite and Python. I turned my 2hr45m workshop from PyCon into the latest official tutorial on the Datasette website. It includes an extensive handout which should be useful independently of the video itself.
Status of Python Versions (via) Very clear and useful page showing the exact status of different Python versions. 3.7 reaches end of life today (no more security updates), while 3.11 will continue to be supported until October 2027.
Building Search DSLs with Django (via) Neat tutorial by Dan Lamanna: how to build a GitHub-style search feature—supporting modifiers like “is:open author:danlamanna”—using PyParsing and the Django ORM.
Symbex: search Python code for functions and classes, then pipe them into a LLM
I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.
[... 1,183 words]sqlean.py: Python’s sqlite3 with extensions. Anton Zhiyanov built a new Python package which bundles a fresh, compiled copy of SQLite with his SQLean family of C extensions built right in. Installing it gets you the latest SQLite—3.42.0—with nearly 200 additional functions, including things like define() and eval(), fileio_read() and fileio_write(), percentile_95() and uuid4() and many more. “import sqlean as sqlite3” works as a drop-in replacement for the module from the standard library.
simpleaichat (via) Max Woolf released his own Python package for building against the GPT-3.5 and GPT-4 APIs (and potentially other LLMs in the future).
It’s a very clean piece of API design with some useful additional features: there’s an AsyncAIChat subclass that works with Python asyncio, and the library includes a mechanism for registering custom functions that can then be called by the LLM as tools.
One trick I haven’t seen before: it uses a combination of max_tokens: 1 and a ChatGPT logit_bias to ensure that answers to one of its default prompts are restricted to just numerals between 0 and 9. This is described in the PROMPTS.md file.
pytest-icdiff (via) This is neat: “pip install pytest-icdiff” provides an instant usability upgrade to the output of failed tests in pytest, especially if the assertions involve comparing larger strings or nested JSON objects.
The Python Language Summit 2023: Making the Global Interpreter Lock Optional. Extremely informative update covering Sam Gross’s python-nogil proposal from this year’s language summit at PyCon.
Sam has been working hard on his fork for the past year, and now has it rebased for Python 3.12. If his PEP is accepted it could end up as an optional compile-time build in time for Python 3.13.
“The plan for nogil remains that it would be enabled via a compile-time flag, named --disable-gil. Third-party C extensions would need to provide separate wheels for GIL-disabled Python.”
Trogon (via) The latest project from the Textualize/Rich crew, Trogon provides a Python decorator—@tui—which, when applied to a Click CLI application, adds a new interactive TUI mode which introspects the available subcommands and their options and creates a full Text User Interface—with keyboard and mouse support—for assembling invocations of those various commands.
I just shipped sqlite-utils 3.32 with support for this—it uses an optional dependency, so you’ll need to run “sqlite-utils install trogon” and then “sqlite-utils tui” to try it out.
Writing Python like it’s Rust (via) Fascinating article by Jakub Beránek describing in detail patterns for using type annotations in Python inspired by working in Rust. I learned new tricks about both languages from reading this.
Real Multithreading is Coming to Python—Learn How You Can Use It Now (via) Martin Heinz provides a detailed tutorial on trying out the new Per-Interpreter GIL feature that’s landing in Python 3.12, which allows Python code to run concurrently in multiple threads by spawning separate sub-interpreters, each with their own dedicated GIL.
It’s not an easy feature to play with yet! First you need to compile Python yourself, and then use APIs that are generally only available to C code (but should hopefully become available to Python code itself in Python 3.13).
Martin’s workaround for this is ingenious: it turns out the Python test.support package provides utility functions to help write tests against interpreters, and Martin shows how to abuse this module to launch, run and cleanup interpreters using regular Python code.
He also demonstrates test.support.interpreters.create_channel(), which can be used to create channels with receiver and sender ends, somewhat similar to Go.
Implement DNS in a weekend (via) Fantastically clear and useful guide to implementing DNS lookups, from scratch, using Python’s struct, socket and dataclass modules—Julia Evans plans to follow this up with one for TLS which I am very much looking forward to.
Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox
A couple of speaking appearances last week—one planned, one unplanned. Plus sqlite-utils 3.31
, download-esm
and a new TIL.
Mojo may be the biggest programming advance in decades (via) Jeremy Howard makes a very convincing argument for why the new programming language Mojo is a big deal.
Mojo is a superset of Python designed by a team lead by Chris Lattner, who previously created LLVM, Clang and and Swift.
Existing Python code should work unmodified, but it also adds features that enable performant low-level programming—like “fn” for creating typed, compiled functions and “struct” for memory-optimized alternatives to classes.
It’s worth watching Jeremy’s video where he uses these features to get more than a 2000x speed up implementing matrix multiplication, while still keeping the code readable and easy to follow.
Mojo isn’t available yet outside of a playground preview environment, but it does look like an intriguing new project.