Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

The baseline for web development in 2022 (via) “TL;DR:The baseline for web development in 2022 is: low-spec Android devices in terms of performance, Safari from two years before in terms of Web Standards, and 4G in terms of networks. The web in general is not answering those needs properly, especially in terms of performance where factors such as an over-dependence on JavaScript are hindering our sites’ performance.”

# 27th January 2022, 8:09 pm / alex-russell, web-performance

Two reasons Kubernetes is so complex (via) I like how this article proposes that Kubernetes isn’t trying to be a tool for deploying containers—it’s more like an operating system for a cluster of machines, responsible for the same kind of goals as a regular operating system such as resource sharing and portability. And since everything is built as control loops which attempt to modify actual state to fit the declarative desired state, errors can occur asynchronously seconds or even minutes after the desired state has been updated.

# 27th January 2022, 6:25 pm / kubernetes

Observable Plot Cheatsheets (via) Beautiful new set of cheatsheets by Mike Freeman for the Observable Plot charting library. This is really top notch documentation—the cheatsheets are available as printable PDFs but the real value here is in the interactive versions of them, which include Observable-powered sliders to tweak the different examples and copy out the resulting generated code.

# 25th January 2022, 10:12 pm / visualization, observable, observable-plot

Roblox Return to Service 10/28-10/31 2021 (via) A particularly good example of a public postmortem on an outage. Roblox was down for 72 hours last year, as a result of an extremely complex set of circumstances which took a lot of effort to uncover. It’s interesting to think through what kind of monitoring you would need to have in place to help identify the root cause of this kind of issue.

# 21st January 2022, 4:41 pm / ops, observability, postmortem

How to Add a Favicon to Your Django Site (via) Adam Johnson did the research on the best way to handle favicons - Safari still doesn't handle SVG icons so the best solution today is a PNG served from the /favicon.ico path. This article inspired me to finally add a proper favicon to Datasette.

# 20th January 2022, 7:03 am / django, favicons, adam-johnson

Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool.

# 18th January 2022, 8:53 pm / performance, postgresql, optimization

SQLime: SQLite Playground (via) Anton Zhiyanov built this useful mobile-friendly online playground for trying things out it SQLite. It uses the sql.js library which compiles SQLite to WebAssembly, so it runs everything in the browser—but it also supports saving your work to Gists via the GitHub API. The JavaScript source code is fun to read: the site doesn’t use npm or Webpack or similar, opting instead to implement everything library-free using modern JavaScript modules and Web Components.

# 17th January 2022, 7:08 pm / javascript, sqlite, web-components, webassembly, anton-zhiyanov

Abusing AWS Lambda to make an Aussie Search Engine (via) Ben Boyter built a search engine that only indexes .au Australian websites, with the novel approach of directly compiling the search index into 250 different ~40MB large lambda functions written in Go, then running searches across 12 million pages by farming them out to all of the lambdas and combining the results. His write-up includes all sorts of details about how he built this, including how he ran the indexer and how he solved the surprisingly hard problem of returning good-enough text snippets for the results.

# 16th January 2022, 8:52 pm / aws, go, lambda, search

Writing a minimal Lua implementation with a virtual machine from scratch in Rust. Phil Eaton implements a subset of Lua in a Rust in this detailed tutorial.

# 15th January 2022, 6:29 pm / compilers, lua, rust, phil-eaton

Announcing Parcel CSS: A new CSS parser, compiler, and minifier written in Rust! An interesting thing about tools like this being written in Rust is that since the Rust-to-WASM pipeline is well trodden at this point, the live demo that this announcement links to runs entirely in the browser.

# 13th January 2022, 8:40 pm / css, rust, webassembly

Hashids (via) Confusingly named because it’s not really a hash—this library (available in 40+ languages) offers a way to convert integer IDs to and from short strings of text based on a salt which, if kept secret, should help prevent people from deriving the IDs and using them to measure growth of your service. It works using a base62 alphabet that is shuffled using the salt.

# 8th January 2022, 7:31 pm / ids

The Asymmetry of Open Source (via) Caddy creator Matt Holt provides “a comprehensive guide to funding open source software projects”. This is really useful—it describes a whole range of funding models that have been demonstrated to work, including sponsorship, consulting, private support channels and more.

# 24th December 2021, 9:11 pm / open-source, matt-holt

Annotated explanation of David Beazley’s dataklasses (via) David Beazley released a self-described “deliciously evil spin on dataclasses” that uses some deep Python trickery to implement a dataclass style decorator which creates classes that import 15-20 times faster than the original. I put together a heavily annotated version of his code while trying to figure out how all of the different Python tricks in it work.

# 20th December 2021, 5:05 am / david-beazley, python, til

Transactionally Staged Job Drains in Postgres. Any time I see people argue that relational databases shouldn’t be used to implement job queues I think of this post by Brandur from 2017. If you write to a queue before committing a transaction you run the risk of a queue consumer trying to read from the database before the new row becomes visible. If you write to the queue after the transaction there’s a risk an error might result in your message never being written. So: write to a relational staging table as part of the transaction, then have a separate process read from that table and write to the queue.

# 18th December 2021, 1:34 am / postgresql, queues, scaling, brandur-leach

TypeScript for Pythonistas (via) Really useful explanation of how TypeScript differs from Python with mypy. I hadn’t realized TypeScript leans so far into structural typing, to the point that two types with different names but the same “shape” are identified as being the same type as each other.

# 17th December 2021, 7:43 pm / python, typescript, mypy

A deep dive into an NSO zero-click iMessage exploit: Remote Code Execution (via) Fascinating and terrifying description of an extremely sophisticated attack against iMessage. iMessage was passing incoming image bytes through to a bunch of different libraries to figure out which image format should be decoded, including a PDF renderer that supported the old JBIG2 compression format. JBIG2 includes a mechanism for programatically swapping the values of individual black and white pixels... which turns out to be Turing complete, and means that a sufficiently cunning “image” can include a full computer architecture defined in terms of logical bit operations. Combine this with an integer overflow and you can perform arbitrary memory operations that break out of the iOS sandbox.

# 16th December 2021, 8:33 pm / security

servefolder.dev (via) Absurdly clever application of service workers and the file system API: you can select a folder from your computer and the contents of that folder will be served (just to you) from a path on this website—all without uploading any content. The code is on GitHub and offers a useful, succinct introduction to how to use those APIs.

# 12th December 2021, 6:32 pm / browsers, javascript, serviceworkers

wheel.yml for Pyjion using cibuildwheel (via) cibuildwheel, maintained by the Python Packaging Authority, builds and tests Python wheels across multiple platforms. I hadn’t realized quite how minimal a configuration using their GitHub Actions action was until I looked at how Pyjion was using it.

# 10th December 2021, 3:05 am / python, github-actions

Introducing stack graphs (via) GitHub launched “precise code navigation” for Python today—the first language to get support for this feature. Click on any Python symbol in GitHub’s code browsing views and a box will show you exactly where that symbol was defined—all based on static analysis by a custom parser written in Rust as opposed to executing any Python code directly. The underlying computer science uses a technique called stack graphs, based on scope graphs research from Eelco Visser’s research group at TU Delft.

# 9th December 2021, 11:07 pm / compilers, github, python, rust

Notes on Notes.app. Apple’s Notes app keeps its data in a SQLite database at ~/Library/Group\ Containers/group.com.apple.notes/NoteStore.sqlite—but it’s pretty difficult to extract data from. It turns out the note text is stored as a gzipped protocol buffers object in the ZICNOTEDATA.ZDATA column. Steve Dunham did the hard work of figuring out how it all works—the complexity stems from Apple’s use of CRDT’s to support seamless multiple edits from different devices.

# 9th December 2021, 10:39 pm / apple, sqlite, crdt, dogsheep

s3-credentials 0.8. The latest release of my s3-credentials CLI tool for creating S3 buckets with credentials to access them (with read-write, read-only or write-only policies) adds a new --public option for creating buckets that allow public access, such that anyone who knows a filename can download a file. The s3-credentials put-object command also now sets the appropriate Content-Type heading on the uploaded object.

# 7th December 2021, 7:04 am / cli, projects, s3, s3-credentials

Google Public DNS Flush Cache (via) Google Public DNS (8.8.8.8) have a flush cache page too.

# 6th December 2021, 11:17 pm / dns, google

1.1.1.1/purge-cache (via) Cloudflare’s 1.1.1.1 DNS service has a tool that anyone can use to flush a specific DNS entry from their cache—could be useful for assisting rollouts of new DNS configurations.

# 6th December 2021, 11:15 pm / dns, cloudflare

Centrifuge: a reliable system for delivering billions of events per day (via) From 2018, a write-up from Segment explaining how they solved the problem of delivering webhooks from thousands of different producers to hundreds of potentially unreliable endpoints. They started with Kafka and ended up on a custom system written in Go against RDS MySQL that was specifically tuned to their write-heavy requirements.

# 6th December 2021, 1:41 am / scaling, webhooks, segment

jc (via) This is such a great idea: jc is a CLI tool which knows how to convert the output of dozens of different classic Unix utilities to JSON, so you can more easily process it programmatically, pipe it through jq and suchlike. “pipx install jc” to install, then “dig example.com | jc --dig” to try it out.

# 5th December 2021, 11:05 pm / cli, json, unix, jq

100 years of whatever this will be. This piece by apenwarr defies summarization but I enjoyed reading it a lot.

# 2nd December 2021, 8:37 pm / internet, avery-pennarun

PHP 8.1 release notes (via) PHP is gaining “Fibers” for lightweight cooperative concurrency—very similar to Python asyncio. Interestingly you don’t need to use separate syntax like “await fn()” to call them—calls to non-blocking functions are visually indistinguishable from calls to blocking functions. Considering how much additional library complexity has emerged in Python world from having these two different colours of functions it’s noteworthy that PHP has chosen to go in a different direction here.

# 25th November 2021, 7:53 pm / async, php

Introduction to heredocs in Dockerfiles (via) This is a fantastic upgrade to Dockerfile syntax, enabled by BuildKit and a new frontend for executing the Dockerfile that can be specified with a #syntax= directive. I often like to create a standalone Dockerfile that works without needing other files from a directory, so being able to use <<EOF syntax to populate configure files from inline blocks of code is really handy.

# 22nd November 2021, 5:01 pm / docker

Hurl (via) Hurl is “a command line tool that runs HTTP requests defined in a simple plain text format”—written in Rust on top of curl, it lets you run HTTP requests and then execute assertions against the response, defined using JSONPath or XPath for HTML. It can even assert that responses were returned within a specified duration.

# 22nd November 2021, 3:32 am / curl, http, rust

Cookiecutter Data Science (via) Some really solid thinking in this documentation for the DrivenData cookiecutter template. They emphasize designing data science projects for repeatability, such that just the src/ and data/ folders can be used to recreate all of the other analysis from scratch. I like the suggestion to give each project a dedicated S3 bucket for keeping immutable copies of the original raw data that might be too large for GitHub.

# 18th November 2021, 3:21 pm / data-science, cookiecutter

Years

Tags