Simon Willison’s Weblog

Subscribe

Weeknotes: Rocky Beaches, Datasette 0.48, a commit history of my database

21st August 2020

This week I helped Natalie launch Rocky Beaches, shipped Datasette 0.48 and several releases of datasette-graphql, upgraded the CSRF protection for datasette-upload-csvs and figured out how to get a commit log of changes to my blog by backing up its database to a GitHub repository.

Rocky Beaches

Natalie released the first version of rockybeaches.com this week. It’s a site that helps you find places to go tidepooling (known as rockpooling in the UK) and figure out the best times to go based on low tide times.

Screenshot of the Pillar Point page for Rocky Beaches

I helped out with the backend for the site, mainly as an excuse to further explore the idea of using Datasette to power full websites (previously explored with Niche Museums and my TILs).

The site uses a pattern I’ve been really enjoying: it’s essentially a static dynamic site. Pages are dynamically rendered by Datasette using Jinja templates and a SQLite database, but the database itself is treated as a static asset: it’s built at deploy time by this GitHub Actions workflow and deployed (currently to Vercel) as a binary asset along with the code.

The build script uses yaml-to-sqlite to load two YAML files—places.yml and stations.yml—and create the stations and places database tables.

It then runs two custom Python scripts to fetch relevant data for those places from iNaturalist and the NOAA Tides & Currents API.

The data all ends up in the Datasette instance that powers the site—you can browse it at www.rockybeaches.com/data or interact with it using GraphQL API at www.rockybeaches.com/graphql

The code is a little convoluted at the moment—I’m still iterating towards the best patterns for building websites like this using Datasette—but I’m very pleased with the productivity and performance that this approach produced.

Datasette 0.48

Highlights from Datasette 0.48 release notes:

I released a new version of datasette-cluster-map that takes advantage of the new columns argument to only inject Leaflet maps JavaScript onto the page if the table being rendered includes latitude and longitude columns—previously the plugin would load extra code on pages that weren’t going to render a map at all. That’s now running on https://global-power-plants.datasettes.com/.

datasette-graphql

Using datasette-graphql for Rocky Beaches inspired me to add two new features:

  • A new graphql() Jinja custom template function that lets you execute custom GraphQL queries inside a Datasette template page—which turns out to be a pretty elegant way for the template to load exactly the data that it needs in order to render the page. Here’s how Rocky Beaches uses that. Issue 50.
  • Some of the iNaturalist data that Rocky Beaches uses is stored as JSON data in text columns in SQLite—mainly because I was too lazy to model it out as tables. This was coming out of the GraphQL API as strings-containing-JSON, so I added a json_columns plugin configuration mechanism for turning those into Graphene GenericScalar fields—see issue 53 for details.

I also landed a big performance improvement. The plugin works by introspecting the database and generating a GraphQL schema that represents those tables, columns and views. For tables with a lot of tables this can get expensive, and the introspection was being run on every request.

I didn’t want to require a server restart any time the schema changed, so I didn’t want to cache the schema in-memory. Ideally it would be cached but the cache would become invalid any time the schema itself changed.

It turns out SQLite has a mechanism for this: the PRAGMA schema_version statement, which returns an integer version number that changes any time the underlying schema is changed (e.g. a table is added or modified).

I built a quick datasette-schema-versions plugin to try this feature out (in less than twenty minutes thanks to my datasette-plugin cookiecutter template) and prove to myself that it works. Then I built a caching mechanism for datasette-graphql that uses the current schema_version as the cache key. See issue 51 for details.

asgi-csrf and datasette-upload-csvs

datasette-upload-csvs is a Datasette plugin that adds a form for uploading CSV files and converting them to SQLite tables.

Datasette 0.44 added CSRF protection, which broke the plugin. I fixed that this week, but it took some extra work because file uploads use the multipart/form-data HTTP mechanism and my asgi-csrf library didn’t support that.

I fixed that this week, but the code was quite complicated. Since asgi-csrf is a security library I decided to aim for 100% code coverage, the first time I’ve done that for one of my projects.

I got there with the help of codecov.io and pytest-cov. I wrote up what I learned about those tools in a TIL.

Backing up my blog database to a GitHub repository

I really like keeping content in a git repository (see Rocky Beaches and Niche Museums). Every content management system I’ve ever been has eventually desired revision control, and modeling that in a database and adding it to an existing project is always a huge pain.

I have 18 years of content on this blog. I want that backed up to git—and this week I realized I have the tools to do that already.

db-to-sqlite is my tool for taking any SQL Alchemy supported database (so far tested with MySQL and PostgreSQL) and exporting it into a SQLite database.

sqlite-diffable is a very early stage tool I built last year. The idea is to dump a SQLite database out to disk in a way that is designed to work well with git diffs. Each table is dumped out as newline-delimited JSON, one row per line.

So... how about converting my blog’s PostgreSQL database to SQLite, then dumping it to disk with sqlite-diffable and committing the result to a git repository? And then running that in a GitHub Action?

Here’s the workflow. It does exactly that, with a few extra steps: it only grabs a subset of my tables, and it redacts the password column from my auth_user table so that my hashed password isn’t exposed in the backup.

I now have a commit log of changes to my blog’s database!

I’ve set it to run nightly, but I can trigger it manually by clicking a button too.

TIL this week

Releases this week