Simon Willison’s Weblog

Subscribe

Datasette 1.0a14: The annotated release notes

5th August 2024

Released today: Datasette 1.0a14. This alpha includes significant contributions from Alex Garcia, including some backwards-incompatible changes in the run-up to the 1.0 release.

Metadata now lives in a database

The biggest change in the alpha concerns how Datasette’s metadata system works.

Datasette can record and serve metadata about the databases, tables and columns that it is serving. This includes things like the source of the data, the license it is made available under and descriptions of the tables and columns.

Historically this has been powered by a metadata.json file. Over time, this file grew to include all sorts of things that weren’t strictly metadata—things like plugin configuration. Cleaning this up is a major breaking change for Datasette 1.0, and Alex has been working on this across several alphas.

The latest alpha adds a new upgrade guide describing changes plugin authors will need to make to support the new metadata system.

The big change in 1.0a14 is that metadata now lives in Datasette’s hidden _internal SQLite database, in four new tables called metadata_instance, metadata_databases, metadata_resources and metadata_columns. The schema for these is now included in the documentation (updated using this Cog code), but rather than accessing those tables directly plugins are encouraged to use the new set_*_metadata() and get_*_metadata() methods on the Datasette class.

I plan to use these new tables to build a new performant, paginated homepage that shows all of the databases and tables that Datasette is serving, complete with their metadata—without needing to make potentially hundreds of calls to the now-removed get_metadata() plugin hook.

datasette-remote-metadata 0.2a0

When introducing new plugin internals like this it’s always good to accompany them with a plugin that exercises them. datasette-remote-metadata is a few years old now, and provides a mechanism for hosting the metadata for a Datasette instance at a separate URL. This means you can deploy a stateless Datasette instance with a large database and later update the attached metadata without having to re-deploy the whole thing.

I released a new alpha of that plugin which switches over to the new metadata mechanism. The core code ended up looking like this, imitating code Alex wrote for Datasette Core:

async def apply_metadata(datasette, metadata_dict):
    for key in metadata_dict or {}:
        if key == "databases":
            continue
        await datasette.set_instance_metadata(key, metadata_dict[key])
    # database-level
    for dbname, db in metadata_dict.get("databases", {}).items():
        for key, value in db.items():
            if key == "tables":
                continue
            await datasette.set_database_metadata(dbname, key, value)
        # table-level
        for tablename, table in db.get("tables", {}).items():
            for key, value in table.items():
                if key == "columns":
                    continue
                await datasette.set_resource_metadata(dbname, tablename, key, value)
            # column-level
            for columnname, column_description in table.get("columns", {}).items():
                await datasette.set_column_metadata(
                    dbname, tablename, columnname, "description", column_description
                )

SQLite isolation_level=“IMMEDIATE”

Sylvain Kerkour wrote about the benefits of IMMEDIATE transactions back in February. The key issue here is that SQLite defaults to starting transactions in DEFERRED mode, which can lead to SQLITE_BUSY errors if a transaction is upgraded to a write transaction mid-flight. Starting in IMMEDIATE mode for Datasette’s dedicated write connection should help avoid this.

Frustratingly I failed to replicate the underlying problem in my own tests, despite having anecdotally seen it happen in the past.

After spending more time than I had budgeted for on this, I decided to ship it as an alpha to get it properly exercised before the 1.0 stable release.

Updating the URLs

Here’s another change that was important to get out before 1.0.

Datasette’s URL design had a subtle blemish. The following page had two potential meanings:

  • /databasename—list all of the tables in the specified database
  • /databasename?sql=—execute an arbitrary SQL query against that database

This also meant that the JSON structure returned by /database.json v.s. /database.json?sql= was different.

Alex and I decided to fix that. Alex laid out the new design in issue #2360—there are quite a few other changes, but the big one is that we are splitting out the SQL query interface to a new URL: /databasename/-/query?sql=—or /databasename/-/query.json?sql= for the JSON API.

We’ve added redirects from the old URLs to the new ones, so existing links should continue to work.

Everything else

  • Fix for a bug where canned queries with named parameters could fail against SQLite 3.46. (#2353)

This reflects a bug fix that went out in Datasette 0.64.7.

There’s still more to be done making Datasette play well with caches, but this is a great, low-risk start.

  • Dropdown menus now use a z-index that should avoid them being hidden by plugins. (#2311)

A cosmetic bug that showed up on Datasette Cloud when using the datasette-cluster-map plugin.

  • Incorrect table and row names are no longer reflected back on the resulting 404 page. (#2359)

This was reported as a potential security issue. The table names were correctly escaped, so this wasn’t an XSS, but there was still potential for confusion if an attacker constructed a URL along the lines of /database-does-not-exist-visit-www.attacker.com-for-more-info. A similar fix went out in Datasette 0.64.8.

  • Improved documentation for async usage of the track_event(datasette, event) hook. (#2319)
  • Fixed some HTTPX deprecation warnings. (#2307)
  • Datasette now serves a <html lange="en"> attribute. Thanks, Charles Nepote. (#2348)
  • Datasette’s automated tests now run against the maximum and minimum supported versions of SQLite: 3.25 (from September 2018) and 3.46 (from May 2024). Thanks, Alex Garcia. (#2352)
  • Fixed an issue where clicking twice on the URL output by datasette --root produced a confusing error. (#2375)

Tricks to help construct the release notes

I still write the Datasette release notes entirely by hand (aside from a few words auto-completed by GitHub Copilot)—I find the process of writing them to be really useful as a way to construct a final review of everything before it goes out.

I used a couple of tricks to help this time. I always start my longer release notes with an issue. The GitHub diff view is useful for seeing what’s changed since the last release, but I took it a step further this time with the following shell command:

git log --pretty=format:"- %ad: %s %h" --date=short --reverse 1.0a13...81b68a14

This outputs a summary of each commit in the range, looking like this (truncated):

- 2024-03-12: Added two things I left out of the 1.0a13 release notes 8b6f155b
- 2024-03-15: Fix httpx warning about app=self.app, refs #2307 5af68377
- 2024-03-15: Fixed cookies= httpx warning, refs #2307 54f5604c
...

Crucially, the syntax of this output is in GitHub Flavored Markdown—and pasting it into an issue comment causes both the issue references and the commit hashes to be expanded into links that look like this:

2024-03-12: Added two things I left out of the 1.0a13 release notes 8b6f155 2024-03-15: Fix httpx warning about app=self.app, refs Fix httpx deprecation warnings #2307 5af6837 2024-03-15: Fixed cookies= httpx warning, refs Fix httpx deprecation warnings #2307 54f5604

It’s a neat way to get a quick review of what’s changed, and also means that those issues will automatically link back to the new issue where I’m constructing the release notes.

I wrote this up in a TIL here, along with another trick I used where I used LLM to get Claude 3.5 Sonnet to summarize my changes for me:

curl 'https://github.com/simonw/datasette/compare/1.0a13...2ad51baa3.diff' \
  | llm -m claude-3.5-sonnet --system \
  'generate a short summary of these changes, then a bullet point list of detailed release notes'