Simon Willison’s Weblog

Subscribe

Items in 2021

Filters: Year: 2021 × Sorted by date


Datasette is four years old today. I marked the occasion with a short Twitter thread about the project so far. # 13th November 2021, 6:14 pm

One could never price a thirty year mortgage in bitcoin because its volatility makes it completely unpredictable and no sensible bank could calculate the risk of covering that debt. A world in which Elon Musk can tweet two emojis and your home depreciates 80% in value is a dystopia.

Stephen Diehl # 10th November 2021, 7:45 am

Deno Deploy Beta 3 (via) I missed Deno Deploy when it first came out back in June: it’s a really interesting new hosting environment for scripts written in Deno, Node.js creator Ryan Dahl’s re-imagining of Node.js. Deno Deploy runs your code using v8 isolates running in 28 regions worldwide, with a clever BroadcastChannel mechanism (inspired by the browser API of the same name) that allows instances of the server-side code running in different regions to send each other messages. See the “via” link for my annotated version of a demo by Ondřej Žára that got me excited about what it can do. # 7th November 2021, 1:51 am

AWS IAM definitions in Datasette (via) As part of my ongoing quest to conquer IAM permissions, I built myself a Datasette instance that lets me run queries against all 10,441 permissions across 280 AWS services. It’s deployed by a build script running in GitHub Actions which downloads a 8.9MB JSON file from the Salesforce policy_sentry repository—policy_sentry itself creates that JSON file by running an HTML scraper against the official AWS documentation! # 6th November 2021, 3:47 am

A half-hour to learn Rust. I haven’t tried to write any Rust yet but I occasionally find myself wanting to read it, and I find some of the syntax really difficult to get my head around. This article helped a lot: it provides a quick but thorough introduction to most of Rust’s syntax, with clearly explained snippet examples for each one. # 5th November 2021, 5:21 am

An oral history of Bank Python (via) Fascinating description of a very custom Python environment inside a large investment bank—where all of the code lives inside the Python environment itself, everything can be imported into the same process and a directed acyclic graph engine implements Excel-style reactive dependencies. Plenty of extra flavour from people who’ve worked with this (and related) Python systems in the Hacker News comments. # 5th November 2021, 5:18 am

Weeknotes: datasette-jupyterlite, s3-credentials and a Python packaging talk

My big project this week was s3-credentials, described yesterday—but I also put together a fun expermiental Datasette plugin bundling JupyterLite and wrote up my PyGotham talk on Python packaging.

[... 476 words]

How to build, test and publish an open source Python library

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

[... 2055 words]

s3-credentials: a tool for creating credentials for S3 buckets

I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.

[... 1618 words]

DuckDB-Wasm: Efficient Analytical SQL in the Browser (via) First SQLite, now DuckDB: options for running database engines in the browser using WebAssembly keep on growing. DuckDB means browsers now have a fast, intuitive mechanism for querying Parquet files too. This also supports the same HTTP Range header trick as the SQLite demo from a while back, meaning it can query large databases loaded over HTTP without downloading the whole file. # 29th October 2021, 3:25 pm

aws-lambda-adapter. AWS Lambda added support for Docker containers last year, but with a very weird shape: you can run anything on Lambda that fits in a Docker container, but unlike Google Cloud Run your application doesn’t get to speak HTTP: it needs to run code that listens for proprietary AWS lambda events instead. The obvious way to fix this is to run some kind of custom proxy inside the container which turns AWS runtime events into HTTP calls to a regular web application. Serverlessish and re:Web are two open source projects that implemented this, and now AWS have their own implementation of that pattern, written in Rust. # 28th October 2021, 5:04 am

Weeknotes: Learning Kubernetes, learning Web Components

I’ve been mainly climbing the learning curve for Kubernetes and Web Components this week. I also released Datasette 0.59.1 with Python 3.10 compatibility and an updated Docker image.

[... 1101 words]

But this much is clear: Facebook knew all along. Their own employees were desperately trying to get anyone inside the company to listen as their products radicalized their own friends and family members. And as they were breaking the world, they had an army of spokespeople publicly and privately gaslighting and intimidating reporters and researchers who were trying to ring the alarm bell. They knew all along and they simply did not give a shit.

Ryan Broderick # 25th October 2021, 8:22 pm

Tonic (via) Really interesting library for building Web Components: it’s tiny (just 350 lines of code), works directly in browsers without any compile or build step and makes very creative use of modern JavaScript features such as async generators. # 23rd October 2021, 5:21 am

New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:). # 21st October 2021, 10:40 pm

Why you shouldn’t invoke setup.py directly (via) Paul Ganssle explains why you shouldn’t use “python setup.py command” any more. I’ve mostly switched to pip and pytest and twine but I was still using “python setup.py sdist”—apparently the new replacement recipe for that is “python -m build”. # 19th October 2021, 5:22 pm

Datasette 0.59: The annotated release notes

Datasette 0.59 is out, with a miscellaneous grab-bag of improvements. Here are the annotated release notes.

[... 2103 words]

Where does all the effort go? Looking at Python core developer activity (via) Łukasz Langa used Datasette to explore 28,780 pull requests made to the CPython GitHub repository, using some custom Python scripts (and sqlite-utils) to load in the data. # 18th October 2021, 8:21 pm

Tests aren’t enough: Case study after adding type hints to urllib3. Very thorough write-up by Seth Michael Larson describing what it took for the urllib3 Python library to fully embrace mypy and optional typing and what they learned along the way. # 18th October 2021, 7:03 pm

Web Browser Engineering (via) In progress free online book by Pavel Panchekha and Chris Harrelson that demonstrates how a web browser works by writing one from scratch using Python, tkinter and the DukPy wrapper around the Duktape JavaScript interpreter. # 17th October 2021, 3:53 pm

How to win at CORS (via) Jake Archibald’s definitive guide to CORS, including a handy CORS playground interactive tool. Also includes a useful history explaining why we need CORS in the first place. # 12th October 2021, 2:07 pm

Finding and reporting an asyncio bug in Python 3.10

I found a bug in Python 3.10 today! Some notes on how I found it and my process for handling it once I figured out what was going on.

[... 1789 words]

Abusing Terraform to Upload Static Websites to S3 (via) I found this really interesting. Terraform is infrastructure as code software which mostly handles creating and updating infrastructure resources, so it’s a poor fit for uploading files to S3 and setting the correct Content-Type headers for them. But... in figuring out how to do that, this article taught me a ton about how Terraform works. I wonder if that’s a useful general pattern? Get a tool to do something that it’s poorly designed to handle and see how much you learn about that tool along the way. # 6th October 2021, 6:51 pm

Writing for distributed teams (via) Vicki Boykis describes how she only sent 11 emails during her first 12 months working at Automattic, because the company culture there revolves around asynchronous communication through durable writing using the P2 custom WordPress theme. “This is a completely different paradigm than I’ve ever worked in, which has been a world usually riddled with information lost to Slack, Confluence, and dozens of email re:re:res.” # 6th October 2021, 6:29 pm

The GIL and its effects on Python multithreading (via) Victor Skvortsov presents the most in-depth explanation of the Python Global Interpreter Lock I’ve seen anywhere. I learned a ton from reading this. # 29th September 2021, 5:23 pm

Weeknotes: CDC vaccination history fixes, developing in GitHub Codespaces

I spent the last week mostly surrounded by boxes: we’re completing our move to the new place and life is mostly unpacking now. I did find some time to fix some issues with my CDC vaccination history Datasette instance though.

[... 514 words]

django-upgrade (via) Adam Johnson’s new CLI tool for upgrading Django projects by automatically applying changes to counter deprecations made in different versions of the framework. Uses the Python standard library tokenize module which gives it really quick performance in parsing and rewriting Python code. Exciting to see this kind of codemod approach becoming more common in Python world—JavaScript developers use this kind of thing a lot. # 26th September 2021, 5:42 am

New tool: an nginx playground. Julia Evans built a sandbox tool for interactively trying out an nginx configuration and executing test requests through it. I love this kind of tool, and Julia’s explanation of how they built it using a tiny fly.io instance and a network namespace to reduce the amount of damage any malicious usage could cause is really interesting. # 24th September 2021, 6:44 pm

File not found: A generation that grew up with Google is forcing professors to rethink their lesson plans (via) This is fascinating: as-of 2017 university instructors have been increasingly encountering students who have absolutely no idea how files and folders on a computer work. The new generation has a completely different mental model of how applications work, where everything is found using search and data mostly lives inside the application that you use to manipulate it. # 23rd September 2021, 10:49 pm

Introducing Partytown 🎉: Run Third-Party Scripts From a Web Worker (via) This is just spectacularly clever. Partytown is a 6KB JavaScript library that helps you move gnarly poorly performing third-party scripts out of your main page and into a web worker, so they won’t destroy your page performance. The really clever bit is in how it provides sandboxed access to the page DOM: it uses a devious trick where a proxy object provides getters and setters which then make blocking API calls to a separate service worker, using the mostly-forgotten xhr.open(..., false) parameter that turns off the async default for an XMLHttpRequest call. # 23rd September 2021, 6:29 pm