Simon Willison’s Weblog

Weeknotes: datasette-remote-metadata, sqlite-transform --multi

I mentioned Project Pelican (still a codename until the end of the embargo) last week. This week it inspired a new plugin, datasette-remote-metadata.

I also published an article describing the Baked Data architectural pattern and shared the video and transcript of a talk I gave at PyGotham last year.


Datasette’s metadata system is one of the key ways of customizing a Datasette instance: it lets you add extra information about the tables hosted by the instance, lets you configure canned queries and also (due to some scope creep that I’d love to clean up) lets you configure options for various Datasette plugins.

There’s just one catch: when you update the metadata for an instance, you need to re-deploy the whole thing. I increasingly find myself using Datasette and Cloud Run to host databases that are over 1GB in size, so re-deploying just to tweak some text on the homepage is less than ideal.

The datasette-remote-metadata plugin uses the new get_metadata() plugin hook from Datasette 0.58 to work around this problem.

Essentially, it lets you tell Datasette “load your metadata from this external URL”. You provide it with a URL to a JSON or YAML file hosted elsewhere and the plugin will load that file on startup and periodically (by default every 30 seconds) check to see if it has changed.

Those checks for updates only happen if Datasette is receiving traffic. Datasette will try for up to 200ms, then continue the fetch but serve the request using the now-stale metadata rather than block for longer than that.

I’m trying this out for Project Pelican and it seems to be working great—which means we can iterate on the metadata (including the description_html block displayed on the Datasette homepage) without needing to constantly re-deploy the 700+ MB SQLite database file to Cloud Run.

sqlite-transform --multi

sqlite-transform is my command-line tool for running transformations against columns of data in a SQLite database—parsing dates, splitting comma-separated lists into JSON arrays or using custom Python code to apply any custom transformation.

The latest release adds support for splitting one input column into multiple output columns. This is best explained with an example: consider a database table with a location column that contains latitude,longitude points which you would like to split out into separate latitude and longitude columns.

sqlite-transform lambda demo.db places location \
  --code 'return {
    "latitude": float(value.split(",")[0]),
    "longitude": float(value.split(",")[1]),
  }' --multi --drop

The new --multi option means that the passed in --code will return a Python dictionary, and the tool should then add new columns to the specified table for each of the returned dictionary keys and populate them with the corresponding values.

The new --drop option tells the tool to drop that location column after the transformation has been completed.

Releases this week

TIL this week