Simon Willison’s Weblog

Subscribe

Items in May, 2022

Filters: Year: 2022 × Month: May × Sorted by date


Compiling Black with mypyc (via) Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks. # 31st May 2022, 11:24 pm

A Datasette tutorial written by GPT-3

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

[... 1244 words]

Lesser Known Features of ClickHouse (via) I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL. # 31st May 2022, 7:48 pm

Dragonfly: A modern replacement for Redis and Memcached (via) I was initially pretty skeptical of the tagline: does Redis really need a “modern” replacement? But the Background section of the README makes this look like a genuinely interesting project. It re-imagines Redis to have its keyspace partitioned across multiple threads, and uses the VLL lock manager described in a 2014 paper to “compose atomic multi-key operations without using mutexes or spinlocks”. The initial benchmarks show up to a 25x increase in throughput compared to Redis. It’s written in C++. # 30th May 2022, 10:02 pm

Architecture Notes: Datasette (via) I was interviewed for the first edition of Architecture Notes—a new publication (website and newsletter) about software architecture created by Mahdi Yusuf. We covered a bunch of topics in detail: ASGI, SQLIte and asyncio, Baked Data, plugin hook design, Python in WebAssembly, Python in an Electron app and more. Mahdi also turned my scrappy diagrams into beautiful illustrations for the piece. # 27th May 2022, 3:20 pm

Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Hosting provider Fly released Fly Machines this week. I got an early preview and I’ve been working with it for a few days—it’s a fascinating new piece of technology. I’m using it to get my hosting service for Datasette ready for wider release.

[... 1005 words]

upptime (via) “Open-source uptime monitor and status page, powered entirely by GitHub Actions, Issues, and Pages.” This is a very creative (ab)use of GitHub Actions: it runs a scheduled action to check the availability of sites that you specify, records the results in a YAML file (with the commit history tracking them over time) and can automatically open a GitHub issue for you if it detects a new incident. # 26th May 2022, 3:53 am

Benjamin “Zags” Zagorsky: Handling Timezones in Python. The talks from PyCon US have started appearing on YouTube. I found this one really useful for shoring up my Python timezone knowledge: It reminds that if your code calls datetime.now(), datetime.utcnow() or date.today(), you have timezone bugs—you’ve been working with ambiguous representations of instances in time that could span a 26 hour interval from UTC-12 to UTC+14. date.today() represents a 24 hour period and hence is prone to timezone surprises as well. My code has a lot of timezone bugs! # 26th May 2022, 3:40 am

Bundling binary tools in Python wheels

I spotted a new (to me) pattern which I think is pretty interesting: projects are bundling compiled binary applications as part of their Python packaging wheels. I think it’s really neat.

[... 903 words]

Paint Holding—reducing the flash of white on same-origin navigations. I missed this when it happened back in 2019: Chrome (and apparently Safari too—not sure about Firefox) implemented a feature where rather than showing a blank screen in between page navigations Chrome “waits briefly before starting to paint, especially if the page is fast enough”. As a result, fast loading multi-page applications become almost indistinguishable from SPAs (single-page apps). It’s a really neat feature, and now that I know how it works I realize that it explains why page navigations have felt a lot snappier to me over the past few years. # 22nd May 2022, 2:50 am

The balance has shifted away from SPAs (via) “There’s a feeling in the air. A zeitgeist. SPAs are no longer the cool kids they once were 10 years ago.” Nolan Lawson offers some opinions on why the pendulum seems to be swinging back in favour of server-side rendering over rendering every page entirely on the client. He argues that paint holding, back-forward caching and service workers have made the benefits of SPAs over MPAs much less apparent. I’m inclined to agree. # 22nd May 2022, 2:47 am

GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.” # 21st May 2022, 11:31 pm

Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks. # 18th May 2022, 5:47 am

simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool. # 17th May 2022, 5:56 pm

Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps. # 16th May 2022, 11:02 pm

Weeknotes: Camping, a road trip and two new museums

Natalie and I took a week-long road trip and camping holiday. The plan was to camp on Santa Rosa Island in the California Channel Islands, but the boat to the island was cancelled due to bad weather. We treated ourselves to a Central Californian road trip instead.

[... 872 words]

Heroku: Core Impact (via) Ex-Heroku engineer Brandur Leach pulls together some of the background information circulating concerning the now more than a month long Heroku security incident and provides some ex-insider commentary on what went right and what went wrong with a platform that left a huge, if somewhat underappreciated impact on the technology industry at large. # 16th May 2022, 4:24 am

How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines. # 15th May 2022, 8:24 pm

Why Rust’s postfix await syntax is good (via) C J Silverio explains postfix await in Rust—where you can write a line like this, with the ? causing any errors to be caught and turned into an error return from your function:

let count = fetch_all_animals().await?.filter_for_hedgehogs().len(); # 15th May 2022, 2:27 pm

sqlite-utils: a nice way to import data into SQLite for analysis (via) Julia Evans on my sqlite-utils Python library and CLI tool. # 13th May 2022, 6:17 pm

Weeknotes: Datasette Lite, nogil Python, HYTRADBOI

My big project this week was Datasette Lite, a new way to run Datasette directly in a browser, powered by WebAssembly and Pyodide. I also continued my research into running SQL queries in parallel, described last week. Plus I spoke at HYTRADBOI.

[... 1434 words]

SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”. # 4th May 2022, 10:40 pm

Datasette Lite: a server-side Python web application running in a browser

Datasette Lite is a new way to run Datasette: entirely in a browser, taking advantage of the incredible Pyodide project which provides Python compiled to WebAssembly plus a whole suite of useful extras.

[... 4800 words]

Simple declarative schema migration for SQLite (via) This is an interesting, clearly explained approach to the database migration problem. Create a new in-memory database and apply the current schema, then run some code to compare that with the previous schema—which tables are new, and which tables have had columns added. Then apply those changes.

I’d normally be cautious of running something like this because I can think of ways it could go wrong—but SQLite backups are so quick and cheap (just copy the file) that I could see this being a relatively risk-free way to apply migrations. # 3rd May 2022, 6:07 pm

Web Scraping via Javascript Runtime Heap Snapshots (via) This is an absolutely brilliant scraping trick. Adrian Cooney figured out a way to use Puppeteer and the Chrome DevTools protocol to take a heap snapshot of all of the JavaScript running on a web page, then recursively crawl through the heap looking for any JavaScript objects that have a specified selection of properties. This allows him to scrape data from arbitrarily complex client-side web applications. He built a JavaScript library and command line tool that implements the pattern. # 3rd May 2022, 12:51 am

sqlite-utils 3.26.1 (via) I released sqlite-utils 3.36.1 with one tiny but exciting feature: I fixed its one dependency that wasn’t published as a pure Python wheel, which means it can now be used with Pyodide—Python compiled to WebAssembly running in your browser! # 2nd May 2022, 6:43 pm