Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”

# 21st May 2022, 11:31 pm / documentation, gov-uk

Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks.

# 18th May 2022, 5:47 am / parsing, refactoring, ocaml

simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool.

# 17th May 2022, 5:56 pm / projects, datasette, github-actions, shot-scraper

Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps.

# 16th May 2022, 11:02 pm / github-actions

Heroku: Core Impact (via) Ex-Heroku engineer Brandur Leach pulls together some of the background information circulating concerning the now more than a month long Heroku security incident and provides some ex-insider commentary on what went right and what went wrong with a platform that left a huge, if somewhat underappreciated impact on the technology industry at large.

# 16th May 2022, 4:24 am / heroku, brandur-leach

How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines.

# 15th May 2022, 8:24 pm / sql

Why Rust’s postfix await syntax is good (via) C J Silverio explains postfix await in Rust—where you can write a line like this, with the ? causing any errors to be caught and turned into an error return from your function:

let count = fetch_all_animals().await?.filter_for_hedgehogs().len();

# 15th May 2022, 2:27 pm / async, rust

SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”.

# 4th May 2022, 10:40 pm / archives, databases, xml

Simple declarative schema migration for SQLite (via) This is an interesting, clearly explained approach to the database migration problem. Create a new in-memory database and apply the current schema, then run some code to compare that with the previous schema—which tables are new, and which tables have had columns added. Then apply those changes.

I’d normally be cautious of running something like this because I can think of ways it could go wrong—but SQLite backups are so quick and cheap (just copy the file) that I could see this being a relatively risk-free way to apply migrations.

# 3rd May 2022, 6:07 pm / migrations, sqlite

Web Scraping via Javascript Runtime Heap Snapshots (via) This is an absolutely brilliant scraping trick. Adrian Cooney figured out a way to use Puppeteer and the Chrome DevTools protocol to take a heap snapshot of all of the JavaScript running on a web page, then recursively crawl through the heap looking for any JavaScript objects that have a specified selection of properties. This allows him to scrape data from arbitrarily complex client-side web applications. He built a JavaScript library and command line tool that implements the pattern.

# 3rd May 2022, 12:51 am / javascript, scraping

sqlite-utils 3.26.1 (via) I released sqlite-utils 3.36.1 with one tiny but exciting feature: I fixed its one dependency that wasn’t published as a pure Python wheel, which means it can now be used with Pyodide—Python compiled to WebAssembly running in your browser!

# 2nd May 2022, 6:43 pm / python, webassembly, sqlite-utils, pyodide

PyScript demos (via) PyScript was announced at PyCon this morning. It’s a new open source project that provides Web Components built on top of Pyodide, allowing you to use Python directly within your HTML pages in a way that is executed using a WebAssembly copy of Python running in your browser. These demos really help illustrate what it can do—it’s a fascinating new piece of the Python web ecosystem.

# 30th April 2022, 9:50 pm / python, web-components, webassembly, pyodide

Testing Datasette parallel SQL queries in the nogil/python fork. As part of my ongoing research into whether Datasette can be sped up by running SQL queries in parallel I’ve been growing increasingly suspicious that the GIL is holding me back. I know the sqlite3 module releases the GIL and was hoping that would give me parallel queries, but it looks like there’s still a ton of work going on in Python GIL land creating Python objects representing the results of the query.

Sam Gross has been working on a nogil fork of Python and I decided to give it a go. It’s published as a Docker image and it turns out trying it out really did just take a few commands... and it produced the desired results, my parallel code started beating my serial code where previously the two had produced effectively the same performance numbers.

I’m pretty stunned by this. I had no idea how far along the nogil fork was. It’s amazing to see it in action.

# 29th April 2022, 5:45 am / gil, python, docker

HTML event handler attributes: down the rabbit hole (via) onclick="myfunction(event)" is an idiom for passing the click event to a function - but how does it work? It turns out the answer is buried deep in the HTML spec - the browser wraps that string of code in a function(event) { ... that string ... } function and makes the event available to its local scope that way.

# 26th April 2022, 8:35 pm / dom-scripting, html, javascript

Mac OS 8 emulated in WebAssembly (via) Absolutely incredible project by Mihai Parparita. This is a full, working copy of Mac OS 8 (from 1997) running in your browser via WebAssembly—and it’s fully loaded with games and applications too. I played with Photoshop 3.0 and Civilization and there’s so much more on there to explore too—I finally get to try out HyperCard!

# 26th April 2022, 7:16 pm / computer-history, mac, mihai-parparita, webassembly

Learn Go with tests. I really like this approach to learning a new language: start by learning to write tests (which gets you through hello world, environment setup and test running right from the beginning) and use them to explore the language. I also really like how modern Go development no longer depends on the GOPATH, which I always found really confusing.

# 26th April 2022, 7:12 pm / go

jq language description (via) I love jq but I’ve always found it difficult to remember how to use it, and the manual hasn’t helped me as much as I would hope. It turns out the jq wiki on GitHub offers an alternative, more detailed description of the language which fits the way my brain works a lot better.

# 26th April 2022, 7:04 pm / documentation, programming-languages, jq

A tiny CI system (via) Christian Ştefănescu shares a recipe for building a tiny self-hosted CI system using Git and Redis. A post-receive hook runs when a commit is pushed to the repo and uses redis-cli to push jobs to a list. Then a separate bash script runs a loop with a blocking “redis-cli blpop jobs” operation which waits for new jobs and then executes the CI job as a shell script.

# 26th April 2022, 3:39 pm / bash, continuous-integration, git, redis

WebAIM guide to using iOS VoiceOver to evaluate web accessibility (via) I asked for pointers on learning to use VoiceOver on my iPhone for accessibility testing today and Matt Hobbs pointed me to this tutorial from the WebAIM group at Utah State University.

# 24th April 2022, 7:35 pm / accessibility, screen-readers, ios

Web Components as Progressive Enhancement (via) I think this is a key aspect of Web Components I had been missing: since they default to rendering their contents, you can use them as a wrapper around regular HTML elements that can then be progressively enhanced once the JavaScript has loaded.

# 21st April 2022, 9:33 pm / web-components

Glue code to quickly copy data from one Postgres table to another (via) The Python script that Retool used to migrate 4TB of data between two PostgreSQL databases. I find the structure of this script really interesting—it uses Python to spin up a queue full of ID ranges to be transferred and then starts some threads, but then each thread shells out to a command that runs “psql COPY (SELECT ...) TO STDOUT” and pipes the result to “psql COPY xxx FROM STDIN”. Clearly this works really well (“saturate the database’s hardware capacity” according to a comment on HN), and neatly sidesteps any issues with Python’s GIL.

# 19th April 2022, 4:57 pm / postgresql, python

Netlify Edge Functions: A new serverless runtime powered by Deno. You can now run Deno scripts directly in Netlify’s edge CDN—bundled as part of their default pricing plan. Interesting that they decided to host it on Deno’s Deno Deploy infrastructure. The hello world example is pleasingly succinct:

export default () => new Response(“Hello world”)

# 19th April 2022, 4:46 pm / serverless, deno

How to push tagged Docker releases to Google Artifact Registry with a GitHub Action. Ben Welsh’s writeup includes detailed step-by-step instructions for getting the mysterious “Workload Identity Federation” mechanism to work with GitHub Actions and Google Cloud. I’ve been dragging my heels on figuring this out for quite a while, so it’s great to see the steps described at this level of detail.

# 18th April 2022, 3:41 am / github, google, docker, ben-welsh

Litestream: Live Read Replication (via) The documentation for the read replication implemented in the latest Litestream beta (v0.4.0-beta.2). The design is really simple and clever: the primary runs a web server on a port, and replica instances can then be started with a configured URL pointing to the IP and port of the primary. That’s all it takes to have a SQLite database replicated to multiple hosts, each of which can then conduct read queries against their local copies.

# 13th April 2022, 2:04 am / replication, sqlite, litestream

Datasette for geospatial analysis (via) I added a new page to the Datasette website describing how Datasette can be used for geospatial analysis, pulling together several of the relevant plugins and tools from the Datasette ecosystem.

# 13th April 2022, 12:48 am / geospatial, plugins, datasette

datasette-dashboards (via) Romain Clement’s datasette-dashboards plugin lets you configure dashboards for Datasette using YAML, combining markdown blocks, Vega graphs and single number metrics using a layout powered by CSS grids. This is a beautiful piece of software design, with a very compelling live demo.

# 7th April 2022, 6:36 pm / plugins, visualization, datasette

WebAssembly in my Browser Desktop Environment (via) Dustin Brett built the WebAssembly demo to end all WebAssembly demos: his daedalOS browser desktop environment simulates a Windows-style operating system, and bundles WebAssembly projects that include v86 for 486 emulation, js-dos for DOS emulation to run Doom, BoxedWine to run Wine applications like Notepad++, Ruffle to emulate Flash, ffmpeg.wasm to power audio and video conversion, WASM-ImageMagick for image conversion, Pyodide for a Python shell and more besides that!

# 29th March 2022, 1:26 am / webassembly, pyodide

geoBoundaries. This looks useful: “The world’s largest open, free and research-ready database of political administrative boundaries.” Founded by the geoLab at William & Mary university, and released under a Creative Commons Attribution license that includes a requirement for a citation. File formats offered include shapefiles, GeoJSON and TopoJSON.

# 24th March 2022, 2:03 pm / geospatial, shapefiles, geojson

Deno by example (via) Interesting approach to documentation: a big list of annotated examples illustrating the Deno way of solving a bunch of common problems.

# 17th March 2022, 1:02 am / documentation, deno

Years

Tags