Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle
8th December 2023
I’ve mainly been working on Datasette Enrichments and continuing to explore the possibilities enabled by sqlite-chronicle.
Enrichments
This is the biggest new Datasette feature to arrive in quite a while, and it’s entirely implemented as a plugin.
I described these in detail in Datasette Enrichments: a new plugin framework for augmenting your data (with an accompanying YouTube video demo). The short version: you can now install plugins that can “enrich” data by running transformations (or data fetches) against selected rows—geocoding addresses, or executing a GPT prompt, or applying a regular expression.
The datasette-enrichments plugin provides the mechanism for running these enrichments. Other plugins can then depend on it and define all manner of interesting options for enriching and transforming data.
I’ve built four of these so far, and I wrote some extensive documentation to help people build more. I’m excited to see how people use and build further on this initial foundation.
Datasette Comments
Alex Garcia released the first version of datasette-comments as part of our continuing collaboration to build out Datasette Cloud.
He wrote about that on the Datasette Cloud blog: Annotate and explore your data with datasette-comments.
This is another capability I’ve been looking forward to for years: the plugin lets you leave comments on individual rows within a Datasette instance, in order to collaborate with others on finding stories in data.
sqlite-chronicle and datasette-chronicle
I first wrote about sqlite-chronicle in weeknotes back in September. This week, inspired by my work on embeddings, I spent a bit more time on it and shipped a 0.2 release.
sqlite-chronicle
is a Python library that implements a SQL pattern where a table can have a _chronicle_tablename
companion table created, which is then updated using triggers against the main table.
The chronicle table has a shadow row for every row in the main table, duplicating its primary keys and then storing millisecond timestamp columns for added_ms
and updated_ms
, an integer version
column and a deleted
boolean indicator.
The goal is to record when a row was last inserted or updated, with an atomically incrementing version
ID representing the version of the entire table.
This can then enable all sorts of interesting potential use-cases:
- Identify which rows have been updated or inserted since a previously recorded version
- Synchronize a table with another table, only updating/inserting/deleting rows that have changed since last time
- Run scheduled tasks that only consider rows that have changed in some way
The relevance to enrichments is that I’d like to implement a form of “persistent” enrichment—an enrichment which is configured to run repeatedly against new or updated rows, geocoding new addresses for example.
To do that, I need a mechanism to identify which rows have already been enriched and which need to be enriched again. sqlite-chronicle
is my current plan to provide that mechanism.
It’s still pretty experimental. I recently found that INSERT OR REPLACE INTO
queries don’t behave how I would expect them to, see issue #7.
I also started a new plugin to accompany the feature: datasette-chronicle, which adds two features to Datasette:
- “enable/disable chronicle tracking” table actions for users with the correct permissions, which can be used in the Datasette UI to turn chronicle tracking on and off for a specific table
- For tables that have chronicle enabled, a
?_since=VERSION
querystring parameter which can be used to filter the table to only rows that have changed since the specified version
I’m running the plugin against the documents table on demos.datasette.cloud
—see _chronicle_documents there for the result. That table is populated via GitHub scheduled actions and the Datasette API, as described in Getting started with the Datasette Cloud API—it’s also where I first spotted the INSERT OR REPLACE INTO
issue I described earlier.
Newsroom Robots
I recorded an episode of the Newsroom Robots AI in journalism podcast with Nikita Roy a couple of weeks ago.
She split our conversation into two episodes:
- Simon Willison (Part One): Breaking Down OpenAI’s New Features & Security Risks of Large Language Models—which I ended up using as the basis for two blog entries:
- Simon Willison (Part Two): How Datasette Helps With Investigative Reporting which has the best audio description of Datasette I’ve managed to produce so far.
sqlite-utils 3.36
Quoting the release notes.
- Support for creating tables in SQLite STRICT mode. Thanks, Taj Khattra. (#344)
- CLI commands
create-table
,insert
andupsert
all now accept a--strict
option.- Python methods that can create a table—
table.create()
andinsert/upsert/insert_all/upsert_all
all now accept an optionalstrict=True
parameter.- The
transform
command andtable.transform()
method preserve strict mode when transforming a table.- The
sqlite-utils create-table
command now acceptsstr
,int
andbytes
as aliases fortext
,integer
andblob
respectively. (#606)
Taj Khattra’s contribution of the --strict
and strict=True
options is a beautiful example of my ideal pull request: a clean implementation, comprehensive tests and thoughtful updates to the documentation all bundled together in one go.
Releases
-
sqlite-utils 3.36—2023-12-08
Python CLI utility and library for manipulating SQLite databases -
datasette-leaflet-geojson 0.8.1—2023-12-07
Datasette plugin that replaces any GeoJSON column values with a Leaflet map. -
datasette-chronicle 0.2—2023-12-06
Enable sqlite-chronicle against tables in Datasette -
datasette-enrichments-jinja 0.1—2023-12-06
Datasette enrichment for evaluating templates in a Jinja sandbox -
sqlite-chronicle 0.2.1—2023-12-06
Use triggers to track when rows in a SQLite table were updated or deleted -
datasette-enrichments-gpt 0.3—2023-12-01
Datasette enrichment for analyzing row data using OpenAI’s GPT models -
datasette-statistics 0.2.1—2023-11-30
SQL statistics functions for Datasette -
datasette-enrichments-opencage 0.1—2023-11-30
Geocoding and reverse geocoding using OpenCage -
datasette-enrichments-re2 0.1—2023-11-30
Enrich data using regular expressions powered by re2 -
datasette-enrichments 0.2—2023-11-29
Tools for running enrichments against data stored in Datasette -
datasette-pretty-json 0.3—2023-11-28
Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays
TILs
More recent articles
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024