Simon Willison’s Weblog

Subscribe

Blogmarks in 2021

Filters: Type: blogmark × Year: 2021 × Sorted by date


The Asymmetry of Open Source (via) Caddy creator Matt Holt provides “a comprehensive guide to funding open source software projects”. This is really useful—it describes a whole range of funding models that have been demonstrated to work, including sponsorship, consulting, private support channels and more. # 24th December 2021, 9:11 pm

Annotated explanation of David Beazley’s dataklasses (via) David Beazley released a self-described “deliciously evil spin on dataclasses” that uses some deep Python trickery to implement a dataclass style decorator which creates classes that import 15-20 times faster than the original. I put together a heavily annotated version of his code while trying to figure out how all of the different Python tricks in it work. # 20th December 2021, 5:05 am

Transactionally Staged Job Drains in Postgres. Any time I see people argue that relational databases shouldn’t be used to implement job queues I think of this post by Brandur from 2017. If you write to a queue before committing a transaction you run the risk of a queue consumer trying to read from the database before the new row becomes visible. If you write to the queue after the transaction there’s a risk an error might result in your message never being written. So: write to a relational staging table as part of the transaction, then have a separate process read from that table and write to the queue. # 18th December 2021, 1:34 am

TypeScript for Pythonistas (via) Really useful explanation of how TypeScript differs from Python with mypy. I hadn’t realized TypeScript leans so far into structural typing, to the point that two types with different names but the same “shape” are identified as being the same type as each other. # 17th December 2021, 7:43 pm

A deep dive into an NSO zero-click iMessage exploit: Remote Code Execution (via) Fascinating and terrifying description of an extremely sophisticated attack against iMessage. iMessage was passing incoming image bytes through to a bunch of different libraries to figure out which image format should be decoded, including a PDF renderer that supported the old JBIG2 compression format. JBIG2 includes a mechanism for programatically swapping the values of individual black and white pixels... which turns out to be Turing complete, and means that a sufficiently cunning “image” can include a full computer architecture defined in terms of logical bit operations. Combine this with an integer overflow and you can perform arbitrary memory operations that break out of the iOS sandbox. # 16th December 2021, 8:33 pm

servefolder.dev (via) Absurdly clever application of service workers and the file system API: you can select a folder from your computer and the contents of that folder will be served (just to you) from a path on this website—all without uploading any content. The code is on GitHub and offers a useful, succinct introduction to how to use those APIs. # 12th December 2021, 6:32 pm

wheel.yml for Pyjion using cibuildwheel (via) cibuildwheel, maintained by the Python Packaging Authority, builds and tests Python wheels across multiple platforms. I hadn’t realized quite how minimal a configuration using their GitHub Actions action was until I looked at how Pyjion was using it. # 10th December 2021, 3:05 am

Introducing stack graphs (via) GitHub launched “precise code navigation” for Python today—the first language to get support for this feature. Click on any Python symbol in GitHub’s code browsing views and a box will show you exactly where that symbol was defined—all based on static analysis by a custom parser written in Rust as opposed to executing any Python code directly. The underlying computer science uses a technique called stack graphs, based on scope graphs research from Eelco Visser’s research group at TU Delft. # 9th December 2021, 11:07 pm

Notes on Notes.app. Apple’s Notes app keeps its data in a SQLite database at ~/Library/Group\ Containers/group.com.apple.notes/NoteStore.sqlite—but it’s pretty difficult to extract data from. It turns out the note text is stored as a gzipped protocol buffers object in the ZICNOTEDATA.ZDATA column. Steve Dunham did the hard work of figuring out how it all works—the complexity stems from Apple’s use of CRDT’s to support seamless multiple edits from different devices. # 9th December 2021, 10:39 pm

s3-credentials 0.8. The latest release of my s3-credentials CLI tool for creating S3 buckets with credentials to access them (with read-write, read-only or write-only policies) adds a new --public option for creating buckets that allow public access, such that anyone who knows a filename can download a file. The s3-credentials put-object command also now sets the appropriate Content-Type heading on the uploaded object. # 7th December 2021, 7:04 am

Google Public DNS Flush Cache (via) Google Public DNS (8.8.8.8) have a flush cache page too. # 6th December 2021, 11:17 pm

1.1.1.1/purge-cache (via) Cloudflare’s 1.1.1.1 DNS service has a tool that anyone can use to flush a specific DNS entry from their cache—could be useful for assisting rollouts of new DNS configurations. # 6th December 2021, 11:15 pm

Centrifuge: a reliable system for delivering billions of events per day (via) From 2018, a write-up from Segment explaining how they solved the problem of delivering webhooks from thousands of different producers to hundreds of potentially unreliable endpoints. They started with Kafka and ended up on a custom system written in Go against RDS MySQL that was specifically tuned to their write-heavy requirements. # 6th December 2021, 1:41 am

jc (via) This is such a great idea: jc is a CLI tool which knows how to convert the output of dozens of different classic Unix utilities to JSON, so you can more easily process it programmatically, pipe it through jq and suchlike. “pipx install jc” to install, then “dig example.com | jc --dig” to try it out. # 5th December 2021, 11:05 pm

100 years of whatever this will be. This piece by apenwarr defies summarization but I enjoyed reading it a lot. # 2nd December 2021, 8:37 pm

PHP 8.1 release notes (via) PHP is gaining “Fibers” for lightweight cooperative concurrency—very similar to Python asyncio. Interestingly you don’t need to use separate syntax like “await fn()” to call them—calls to non-blocking functions are visually indistinguishable from calls to blocking functions. Considering how much additional library complexity has emerged in Python world from having these two different colours of functions it’s noteworthy that PHP has chosen to go in a different direction here. # 25th November 2021, 7:53 pm

Introduction to heredocs in Dockerfiles (via) This is a fantastic upgrade to Dockerfile syntax, enabled by BuildKit and a new frontend for executing the Dockerfile that can be specified with a #syntax= directive. I often like to create a standalone Dockerfile that works without needing other files from a directory, so being able to use <<EOF syntax to populate configure files from inline blocks of code is really handy. # 22nd November 2021, 5:01 pm

Hurl (via) Hurl is “a command line tool that runs HTTP requests defined in a simple plain text format”—written in Rust on top of curl, it lets you run HTTP requests and then execute assertions against the response, defined using JSONPath or XPath for HTML. It can even assert that responses were returned within a specified duration. # 22nd November 2021, 3:32 am

Cookiecutter Data Science (via) Some really solid thinking in this documentation for the DrivenData cookiecutter template. They emphasize designing data science projects for repeatability, such that just the src/ and data/ folders can be used to recreate all of the other analysis from scratch. I like the suggestion to give each project a dedicated S3 bucket for keeping immutable copies of the original raw data that might be too large for GitHub. # 18th November 2021, 3:21 pm

Datasette is four years old today. I marked the occasion with a short Twitter thread about the project so far. # 13th November 2021, 6:14 pm

Deno Deploy Beta 3 (via) I missed Deno Deploy when it first came out back in June: it’s a really interesting new hosting environment for scripts written in Deno, Node.js creator Ryan Dahl’s re-imagining of Node.js. Deno Deploy runs your code using v8 isolates running in 28 regions worldwide, with a clever BroadcastChannel mechanism (inspired by the browser API of the same name) that allows instances of the server-side code running in different regions to send each other messages. See the “via” link for my annotated version of a demo by Ondřej Žára that got me excited about what it can do. # 7th November 2021, 1:51 am

AWS IAM definitions in Datasette (via) As part of my ongoing quest to conquer IAM permissions, I built myself a Datasette instance that lets me run queries against all 10,441 permissions across 280 AWS services. It’s deployed by a build script running in GitHub Actions which downloads a 8.9MB JSON file from the Salesforce policy_sentry repository—policy_sentry itself creates that JSON file by running an HTML scraper against the official AWS documentation! # 6th November 2021, 3:47 am

A half-hour to learn Rust. I haven’t tried to write any Rust yet but I occasionally find myself wanting to read it, and I find some of the syntax really difficult to get my head around. This article helped a lot: it provides a quick but thorough introduction to most of Rust’s syntax, with clearly explained snippet examples for each one. # 5th November 2021, 5:21 am

An oral history of Bank Python (via) Fascinating description of a very custom Python environment inside a large investment bank—where all of the code lives inside the Python environment itself, everything can be imported into the same process and a directed acyclic graph engine implements Excel-style reactive dependencies. Plenty of extra flavour from people who’ve worked with this (and related) Python systems in the Hacker News comments. # 5th November 2021, 5:18 am

DuckDB-Wasm: Efficient Analytical SQL in the Browser (via) First SQLite, now DuckDB: options for running database engines in the browser using WebAssembly keep on growing. DuckDB means browsers now have a fast, intuitive mechanism for querying Parquet files too. This also supports the same HTTP Range header trick as the SQLite demo from a while back, meaning it can query large databases loaded over HTTP without downloading the whole file. # 29th October 2021, 3:25 pm

aws-lambda-adapter. AWS Lambda added support for Docker containers last year, but with a very weird shape: you can run anything on Lambda that fits in a Docker container, but unlike Google Cloud Run your application doesn’t get to speak HTTP: it needs to run code that listens for proprietary AWS lambda events instead. The obvious way to fix this is to run some kind of custom proxy inside the container which turns AWS runtime events into HTTP calls to a regular web application. Serverlessish and re:Web are two open source projects that implemented this, and now AWS have their own implementation of that pattern, written in Rust. # 28th October 2021, 5:04 am

Tonic (via) Really interesting library for building Web Components: it’s tiny (just 350 lines of code), works directly in browsers without any compile or build step and makes very creative use of modern JavaScript features such as async generators. # 23rd October 2021, 5:21 am

New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:). # 21st October 2021, 10:40 pm

Why you shouldn’t invoke setup.py directly (via) Paul Ganssle explains why you shouldn’t use “python setup.py command” any more. I’ve mostly switched to pip and pytest and twine but I was still using “python setup.py sdist”—apparently the new replacement recipe for that is “python -m build”. # 19th October 2021, 5:22 pm

Where does all the effort go? Looking at Python core developer activity (via) Łukasz Langa used Datasette to explore 28,780 pull requests made to the CPython GitHub repository, using some custom Python scripts (and sqlite-utils) to load in the data. # 18th October 2021, 8:21 pm