Simon Willison’s Weblog

Subscribe

Weeknotes: Shaving some beautiful yaks

1st December 2021

I’ve been mostly shaving yaks this week—two in particular: the Datasette table refactor and the next release of git-history. I also built and released my first Web Component!

A Web Component for embedding Datasette tables

A longer term goal that I have for Datasette is to figure out a good way of using it to build dashboards, tying together summaries and visualizations of the latest data from a bunch of different sources.

I’m excited about the potential of Web Components to help solve this problem.

My datasette-notebook project is a very early experiment in this direction: it’s a Datasette notebook that provides a Markdown wiki (persisted to SQLite) to which I plan to add the ability to embed tables and visualizations in wiki pages—forming a hybrid of a wiki, dashboarding system and Notion/Airtable-style database.

It does almost none of those things right now, which is why I’ve not really talked about it here.

Web Components offer a standards-based mechanism for creating custom HTML tags. Imagine being able to embed a Datasette table on a page by adding the following to your HTML:

<datasette-table
    url="https://global-power-plants.datasettes.com/global-power-plants/global-power-plants.json"
></datasette-table>

That’s exactly what datasette-table lets you do! Here’s a demo of it in action.

This is version 0.1.0—it works, but I’ve not even started to flesh it out.

I did learn a bunch of things building it though: it’s my first Web Component, my first time using Lit, my first time using Vite and the first JavaScript library I’ve ever packaged and published to npm.

Here’s a detailed TIL on Publishing a Web Component to npm encapsulating everything I’ve learned from this project so far.

This is also my first piece of yak shaving this week: I built this partly to make progress on datasette-notebook, but also because my big Datasette refactor involves finalizing the design of the JSON API for version 1.0. I realized that I don’t actually have a project that makes full use of that API, which has been hindering my attempts to redesign it. Having one or more Web Components that consume the API will be a fantastic way for me to eat my own dog food.

Link: rel=“alternate” for Datasette tables

Here’s an interesting problem that came up while I was working on the datasette-table component.

As designed right now, you need to figure out the JSON URL for a table and pass that to the component.

This is usually a case of adding .json to the path, while preserving any query string parameters—but there’s a nasty edge-case: if your SQLite table itself ends with the string .json (which could happen! Especially since Datasette promises to work with any existing SQLite database) the URL becomes this instead:

/mydb/table.json?_format=json

Telling users of my component that they need to first construct the JSON URL for their page isn’t the best experience: I’d much rather let people paste in the URL to the HTML version and derive the JSON from that.

This is made more complex by the fact that, thanks to --cors, the Web Component can be embedded on any page. And for datasette-notebook I’d like to provide a feature where any URLs to Datasette instances—no matter where they are hosted—are turned into embedded tables automatically.

To do this, I need an efficient way to tell that an arbitrary URL corresponds to a Datasette table.

My latest idea here is to use a combination of HTTP HEAD requests and a Link: rel="alternate" header—something like this:

~ % curl -I 'https://latest.datasette.io/fixtures/compound_three_primary_keys'
HTTP/1.1 200 OK
date: Sat, 27 Nov 2021 20:09:36 GMT
server: uvicorn
Link: https://latest.datasette.io/fixtures/compound_three_primary_keys.json; rel="alternate"; type="application/datasette+json"

This would allow a (hopefully fast) fetch() call from JavaScript to confirm that a URL is a Datasette table, and get back the JSON that should be fetched by the component in order to render it on the page.

I have a prototype of this in Datasette issue #1533. I think it’s a promising approach!

It’s also now part of the ever-growing table refactor. Adding custom headers to page responses is currently far harder than it should be.

sqlite-utils STRICT tables

SQLite 3.37.0 came out at the weekend with a long-awaited feature: STRICT tables, which enforce column types such that you get an error if you try to insert a string into an integer column.

(This has been a long-standing complaint about SQLite by people who love strong typing, and D. Richard Hipp finally shipped the change for them with some salty release notes saying it’s “for developers who prefer that kind of thing.”)

I started researching how to add support for this to my sqlite-utils Python library. You can follow my thinking in sqlite-utils issue #344—I’m planning to add a strict=True option to methods that create tables, but for the moment I’ve shipped new introspection properties for seeing if a table uses strict mode or not.

git-history update

My other big yak this week has been work on git-history. I’m determined to get it into a stable state such that I can write it up, produce a tutorial and maybe produce a video demonstration as well—but I keep on finding things I want to change about how it works.

The big challenge is how to most effectively represent the history of a bunch of different items over time in a relational database schema.

I started with a item table that presents just the most recent version of each item, and an item_version table with a row for every subsequent version.

That table got pretty big, with vast amounts of duplicated data in it.

So I’ve been working on an optimization where columns are only included in an item_version row if they have changed since the previous version.

The problem there is what to do about null—does null mean "this column didn’t change" or does it mean "this column was set from some other value back to null"?

After a few different attempts I’ve decided to solve this with a many-to-many table, so for any row in the item_version table you can see which columns were explicitly changed by that version.

This is all working pretty nicely now, but still needs documentation, and tests, and then a solid write-up and tutorial and demos and a video... hopefully tomorrow!

One of my design decisions for this tool has been to use an underscore prefix for “reserved columns”, such that non-reserved columns can be safely used by the arbitrary data that is being tracked by the tool.

Having columns with names like _id and _item has highlighted several bugs with Datasette’s handling of these column names, since Datasette itself tries to use things like ?_search= for special query string parameters. I released Datasette 0.59.4 with some relevant fixes.

A beautiful yak

A very beautiful yak

As a consumate yak shaver this beautiful yak that showed up on Reddit a few weeks ago has me absolutely delighted. I’ve not been able to determine the photography credit.

Releases this week

TIL this week