Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:).

# 21st October 2021, 10:40 pm / caching, dogpile, http, cloudflare

Why you shouldn’t invoke setup.py directly (via) Paul Ganssle explains why you shouldn’t use “python setup.py command” any more. I’ve mostly switched to pip and pytest and twine but I was still using “python setup.py sdist”—apparently the new replacement recipe for that is “python -m build”.

# 19th October 2021, 5:22 pm / python

Where does all the effort go? Looking at Python core developer activity (via) Łukasz Langa used Datasette to explore 28,780 pull requests made to the CPython GitHub repository, using some custom Python scripts (and sqlite-utils) to load in the data.

# 18th October 2021, 8:21 pm / python, datasette, sqlite-utils, lukasz-langa

Tests aren’t enough: Case study after adding type hints to urllib3. Very thorough write-up by Seth Michael Larson describing what it took for the urllib3 Python library to fully embrace mypy and optional typing and what they learned along the way.

# 18th October 2021, 7:03 pm / python, static-typing, mypy, seth-michael-larson

Web Browser Engineering (via) In progress free online book by Pavel Panchekha and Chris Harrelson that demonstrates how a web browser works by writing one from scratch using Python, tkinter and the DukPy wrapper around the Duktape JavaScript interpreter.

# 17th October 2021, 3:53 pm / browsers, python

How to win at CORS (via) Jake Archibald’s definitive guide to CORS, including a handy CORS playground interactive tool. Also includes a useful history explaining why we need CORS in the first place.

# 12th October 2021, 2:07 pm / javascript, security, jake-archibald, cors

Abusing Terraform to Upload Static Websites to S3 (via) I found this really interesting. Terraform is infrastructure as code software which mostly handles creating and updating infrastructure resources, so it’s a poor fit for uploading files to S3 and setting the correct Content-Type headers for them. But... in figuring out how to do that, this article taught me a ton about how Terraform works. I wonder if that’s a useful general pattern? Get a tool to do something that it’s poorly designed to handle and see how much you learn about that tool along the way.

# 6th October 2021, 6:51 pm / s3

Writing for distributed teams (via) Vicki Boykis describes how she only sent 11 emails during her first 12 months working at Automattic, because the company culture there revolves around asynchronous communication through durable writing using the P2 custom WordPress theme. “This is a completely different paradigm than I’ve ever worked in, which has been a world usually riddled with information lost to Slack, Confluence, and dozens of email re:re:res.”

# 6th October 2021, 6:29 pm / communication, wordpress, vicki-boykis

The GIL and its effects on Python multithreading (via) Victor Skvortsov presents the most in-depth explanation of the Python Global Interpreter Lock I’ve seen anywhere. I learned a ton from reading this.

# 29th September 2021, 5:23 pm / concurrency, gil, python, threads

django-upgrade (via) Adam Johnson’s new CLI tool for upgrading Django projects by automatically applying changes to counter deprecations made in different versions of the framework. Uses the Python standard library tokenize module which gives it really quick performance in parsing and rewriting Python code. Exciting to see this kind of codemod approach becoming more common in Python world—JavaScript developers use this kind of thing a lot.

# 26th September 2021, 5:42 am / cli, django, python, adam-johnson

New tool: an nginx playground. Julia Evans built a sandbox tool for interactively trying out an nginx configuration and executing test requests through it. I love this kind of tool, and Julia’s explanation of how they built it using a tiny fly.io instance and a network namespace to reduce the amount of damage any malicious usage could cause is really interesting.

# 24th September 2021, 6:44 pm / nginx, security, julia-evans, fly

File not found: A generation that grew up with Google is forcing professors to rethink their lesson plans (via) This is fascinating: as-of 2017 university instructors have been increasingly encountering students who have absolutely no idea how files and folders on a computer work. The new generation has a completely different mental model of how applications work, where everything is found using search and data mostly lives inside the application that you use to manipulate it.

Gradually, Garland came to the same realization that many of her fellow educators have reached in the past four years: the concept of file folders and directories, essential to previous generations’ understanding of computers, is gibberish to many modern students.

# 23rd September 2021, 10:49 pm / teaching, usability, computer-literacy

Introducing Partytown 🎉: Run Third-Party Scripts From a Web Worker (via) This is just spectacularly clever. Partytown is a 6KB JavaScript library that helps you move gnarly poorly performing third-party scripts out of your main page and into a web worker, so they won’t destroy your page performance. The really clever bit is in how it provides sandboxed access to the page DOM: it uses a devious trick where a proxy object provides getters and setters which then make blocking API calls to a separate service worker, using the mostly-forgotten xhr.open(..., false) parameter that turns off the async default for an XMLHttpRequest call.

# 23rd September 2021, 6:29 pm / async, javascript, webworkers, serviceworkers

egghead screencasting technical guide (via) Detailed guide to producing high quality screencasts—software to use, audio tips, editing workflow—from the egghead.io online instructor platform.

# 20th September 2021, 5:39 pm / screencasting

Datasette Desktop 0.1.0 (via) This is the first installable version of the new Datasette Desktop macOS application I’ve been building. Please try it out and leave feedback on Twitter or on the GitHub Discussions thread linked from the release notes.

# 8th September 2021, 5:14 am / projects, datasette, electron, datasette-desktop

Making world-class docs takes effort (via) Curl maintainer Daniel Stenberg writes about his principles for good documentation. I agree with all of these: he emphasizes keeping docs in the repo, avoiding the temptation to exclusively generate them from code, featuring examples and ensuring every API you provide has documentation. Daniel describes an approach similar to the documentation unit tests I’ve been using for my own projects: he has scripts which scan the curl documentation to ensure not only that everything is documented but that each documentation area contains the same sections in the same order.

# 6th September 2021, 6:58 pm / curl, documentation, daniel-stenberg

Per-project PostgreSQL (via) Jamey Sharp describes an ingenious way of setting up PostgreSQL instances for each of your local development project, without depending on an always-running shared localhost database server. The trick is a shell script which creates a PGDATA folder in the current folder and then instantiates a PostgreSQL server in --single single user mode which listens on a Unix domain socket in that folder, instead of listening on the network. Jamey then uses direnv to automatically configure that PostgreSQL, initializing the DB if necessary, for each of his project folders.

# 3rd September 2021, 3:06 am / postgresql

API Tokens: A Tedious Survey. Thomas Ptacek reviews different approaches to implementing secure API tokens, from simple random strings stored in a database through various categories of signed token to exotic formats like Macaroons and Biscuits, both new to me.

Macaroons carry a signed list of restrictions with them, but combine it with a mechanism where a client can add their own additional restrictions, sign the combination and pass the token on to someone else.

Biscuits are similar, but “embed Datalog programs to evaluate whether a token allows an operation”.

# 25th August 2021, 12:12 am / apis, security, thomas-ptacek, fly

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database.

# 24th August 2021, 11:16 pm / orm, python, sql, sqlalchemy, pydantic

How Discord Stores Billions of Messages (via) Fascinating article from 2017 describing how Discord migrated their primary message store to Cassandra (from MongoDB, but I could easily see them making the same decision if they had started with PostgreSQL or MySQL).

The trick with scalable NoSQL databases like Cassandra is that you need to have a very deep understanding of the kinds of queries you will need to answer - and Discord had exactly that.

In the article they talk about their desire to eventually migrate to Scylla (a compatible Cassandra alternative written in C++) - in the Hacker News comments they confirm that in 2021 they are using Scylla for a few things but they still have their core messages in Cassandra.

# 24th August 2021, 9:31 pm / cassandra, nosql, scaling, discord

MDN: Subdomain takeovers (via) MDN have a page about subdomain takeover attacks that focuses more on CNAME records: if you have a CNAME pointing to a common delegated hosting provider but haven’t yet provisioned your virtual host there, someone else might beat you to it and use it for an XSS attack.

“Preventing subdomain takeovers is a matter of order of operations in lifecycle management for virtual hosts and DNS.”

I now understand why Google Cloud make your “prove” your ownership of a domain before they’ll let you configure it to host e.g. a Cloud Run instance.

# 22nd August 2021, 5:31 am / dns, security

I stumbled across a nasty XSS hole involving DNS A records. Found out today that an old subdomain that I had assigned an IP address to via a DNS A record was serving unexpected content—turned out I’d shut down the associated VPS and the IP had been recycled to someone else, so their content was now appearing under my domain. It strikes me that if you got really unlucky this could turn into an XSS hole—and that new server could even use Let’s Encrypt to obtain an HTTPS certificate for your subdomain.

I’ve added “audit your A records” to my personal security checklist.

# 22nd August 2021, 5:27 am / dns, security, xss

The Diátaxis documentation framework. Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis.

# 21st August 2021, 10:59 pm / documentation, diataxis

SQLite: STRICT Tables (draft). Draft documentation for a feature that sounds like it could be arriving in SQLite 3.37 (the next release)—adding a “STRICT” table-option keyword to a CREATE TABLE statement will cause the table to strictly enforce typing rules for data in that table, rejecting inserts that fail to match the column’s datatypes.

I’ve seen many programmers dismiss SQLite due to its loose typing, so this feature is really exciting to me: it will hopefully remove a common objection to embracing SQLite for projects.

# 21st August 2021, 7:05 pm / sqlite

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too.

# 19th August 2021, 8:15 pm / compilers, go, json

Product Hunt Engineering Principles (via) Product Hunt implement “Collaborative Single Player Mode”, which they define as “A developer should be able to execute a feature from start to finish -- from the database to the backend, API, frontend, and CSS. The goal is never to get blocked.” I’ve encountered this principle applied to teams before (which I really like) but not for individual developers, which I imagine is more likely to work well for smaller organizations. Intriguing approach.

They also practice trunk driven development with feature flags: “Always start a feature with a feature flag and try to get something to production on day 1.”

And “If a product decision is missing, try to make this decision yourself—it’s better to ask for forgiveness rather than permission.”

# 16th August 2021, 3:35 pm / management, feature-flags

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases.

# 13th August 2021, 10:07 pm / unicode, kim-christie, httpx

GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”.

# 11th August 2021, 4:53 pm / github, docker, github-codespaces

Adding Sphinx autodoc to a project, and configuring Read The Docs to build it. My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today.

# 11th August 2021, 1:21 am / documentation, sqlite-utils, sphinx-docs, read-the-docs

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks.

# 11th August 2021, 1:03 am / documentation, python, sqlite-utils, sphinx-docs

Years

Tags