Weeknotes: Shaving some beautiful yaks
1st December 2021
A Web Component for embedding Datasette tables
A longer term goal that I have for Datasette is to figure out a good way of using it to build dashboards, tying together summaries and visualizations of the latest data from a bunch of different sources.
I’m excited about the potential of Web Components to help solve this problem.
My datasette-notebook project is a very early experiment in this direction: it’s a Datasette notebook that provides a Markdown wiki (persisted to SQLite) to which I plan to add the ability to embed tables and visualizations in wiki pages—forming a hybrid of a wiki, dashboarding system and Notion/Airtable-style database.
It does almost none of those things right now, which is why I’ve not really talked about it here.
Web Components offer a standards-based mechanism for creating custom HTML tags. Imagine being able to embed a Datasette table on a page by adding the following to your HTML:
<datasette-table url="https://global-power-plants.datasettes.com/global-power-plants/global-power-plants.json" ></datasette-table>
This is version 0.1.0—it works, but I’ve not even started to flesh it out.
Here’s a detailed TIL on Publishing a Web Component to npm encapsulating everything I’ve learned from this project so far.
This is also my first piece of yak shaving this week: I built this partly to make progress on
datasette-notebook, but also because my big Datasette refactor involves finalizing the design of the JSON API for version 1.0. I realized that I don’t actually have a project that makes full use of that API, which has been hindering my attempts to redesign it. Having one or more Web Components that consume the API will be a fantastic way for me to eat my own dog food.
Link: rel=“alternate” for Datasette tables
Here’s an interesting problem that came up while I was working on the
As designed right now, you need to figure out the JSON URL for a table and pass that to the component.
This is usually a case of adding
.json to the path, while preserving any query string parameters—but there’s a nasty edge-case: if your SQLite table itself ends with the string
.json (which could happen! Especially since Datasette promises to work with any existing SQLite database) the URL becomes this instead:
Telling users of my component that they need to first construct the JSON URL for their page isn’t the best experience: I’d much rather let people paste in the URL to the HTML version and derive the JSON from that.
This is made more complex by the fact that, thanks to
--cors, the Web Component can be embedded on any page. And for
datasette-notebook I’d like to provide a feature where any URLs to Datasette instances—no matter where they are hosted—are turned into embedded tables automatically.
To do this, I need an efficient way to tell that an arbitrary URL corresponds to a Datasette table.
My latest idea here is to use a combination of HTTP
HEAD requests and a
Link: rel="alternate" header—something like this:
~ % curl -I 'https://latest.datasette.io/fixtures/compound_three_primary_keys' HTTP/1.1 200 OK date: Sat, 27 Nov 2021 20:09:36 GMT server: uvicorn Link: https://latest.datasette.io/fixtures/compound_three_primary_keys.json; rel="alternate"; type="application/datasette+json"
This would allow a (hopefully fast)
I have a prototype of this in Datasette issue #1533. I think it’s a promising approach!
It’s also now part of the ever-growing table refactor. Adding custom headers to page responses is currently far harder than it should be.
sqlite-utils STRICT tables
(This has been a long-standing complaint about SQLite by people who love strong typing, and D. Richard Hipp finally shipped the change for them with some salty release notes saying it’s “for developers who prefer that kind of thing.”)
I started researching how to add support for this to my sqlite-utils Python library. You can follow my thinking in sqlite-utils issue #344—I’m planning to add a
strict=True option to methods that create tables, but for the moment I’ve shipped new introspection properties for seeing if a table uses strict mode or not.
My other big yak this week has been work on git-history. I’m determined to get it into a stable state such that I can write it up, produce a tutorial and maybe produce a video demonstration as well—but I keep on finding things I want to change about how it works.
The big challenge is how to most effectively represent the history of a bunch of different items over time in a relational database schema.
I started with a
item table that presents just the most recent version of each item, and an
item_version table with a row for every subsequent version.
That table got pretty big, with vast amounts of duplicated data in it.
So I’ve been working on an optimization where columns are only included in an
item_version row if they have changed since the previous version.
The problem there is what to do about
null mean "this column didn’t change" or does it mean "this column was set from some other value back to
After a few different attempts I’ve decided to solve this with a many-to-many table, so for any row in the
item_version table you can see which columns were explicitly changed by that version.
This is all working pretty nicely now, but still needs documentation, and tests, and then a solid write-up and tutorial and demos and a video... hopefully tomorrow!
One of my design decisions for this tool has been to use an underscore prefix for “reserved columns”, such that non-reserved columns can be safely used by the arbitrary data that is being tracked by the tool.
Having columns with names like
_item has highlighted several bugs with Datasette’s handling of these column names, since Datasette itself tries to use things like
?_search= for special query string parameters. I released Datasette 0.59.4 with some relevant fixes.
A beautiful yak
As a consumate yak shaver this beautiful yak that showed up on Reddit a few weeks ago has me absolutely delighted. I’ve not been able to determine the photography credit.
Releases this week
s3-credentials: 0.7—(7 releases total)—2021-11-30
A tool for creating credentials for accessing S3 buckets
datasette: 0.59.4—(102 releases total)—2021-11-30
An open source multi-tool for exploring and publishing data
A Web Component for embedding a Datasette table on a page
TIL this week
More recent articles
- Weeknotes: the Datasette Cloud API, a podcast appearance and more - 1st October 2023
- Things I've learned about building CLI tools in Python - 30th September 2023
- Talking Large Language Models with Rooftop Ruby - 29th September 2023
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023