Simon Willison’s Weblog

Subscribe

Weeknotes: sqlite-utils 3.0 alpha, Git scraping in the zeitgeist

7th November 2020

Natalie and I decided to escape San Francisco for election week, and have been holed up in Fort Bragg on the Northern California coast. I’ve mostly been on vacation, but I did find time to make some significant changes to sqlite-utils. Plus notes on an exciting Git scraping project.

Better search in the sqlite-utils 3.0 alpha

I practice semantic versioning with sqlite-utils, which means it only gets a major version bump if I break backwards compatibility in some way.

My goal is to avoid breaking backwards compatibility as much as possible, and I was proud to have made it all the way to version 2.23 representing 23 new feature releases since the 2.0 release without breaking any documented features!

Sadly this run has come to an end: I realized that the table.search() method was poorly designed, and I also needed to grab back the -c command-line option (a shortcut for --csv output) to be used for another purpose.

The chances that either of these changes will break anyone are pretty small, but semantic versioning dictates a major version bump so here we are.

I shipped a 3.0 alpha today, which should hopefully become a stable release very shortly (milestone here).

The big new feature is sqlite-utils search—a command-line tool for executing searches against a full-text search enabled table:

$ sqlite-utils search 24ways-fts4.db articles maps -c title
[{"rowid": 163, "title": "Get To Grips with Slippy Maps", "rank": -10.028754920576421},
 {"rowid": 220, "title": "Finding Your Way with Static Maps", "rank": -9.952534352591737},
 {"rowid": 27, "title": "Putting Design on the Map", "rank": -5.667327088267961},
 {"rowid": 168, "title": "Unobtrusively Mapping Microformats with jQuery", "rank": -4.662224207228984},

Here’s full documentation for the new command.

Notably, this command works against both FTS4 and FTS5 tables in SQLite—despite FTS4 not shipping with a built-in ranking function. I’m using my sqlite-fts4 package for this, which I described back in January 2019 in Exploring search relevance algorithms with SQLite.

Git scraping to predict the election

It’s not quite over yet but the end is in sight, and one of the best tools to track the late arriving vote counts is this Election 2020 results site built by Alex Gaynor and a growing cohort of contributors.

The site is a beautiful example of Git scraping in action, and I’m thrilled that it links to my article in the README!

Take a look at the repo to see how it works. Short version: this GitHub Action workflow grabs the latest snapshot of this undocumented New York Times JSON API once every five minutes and commits it to the repository. It then runs this Python script which iterates through the Git history and generates an HTML summary showing the different batches of new votes that were reported and their impact on the overall race.

The resulting report is published to GitHub pages—resulting in a site that can handle a great deal of traffic and is updated entirely by code running in scheduled actions.

Screenshot of the generated report

This is a perfect use-case for Git scraping: it takes a JSON endpoint that represents the current state of the world and turns it into a sequence of historic snapshots, then uses those snapshots to build a unique and useful new source of information to help people understand what’s going on.

Releases this week

This is Weeknotes: sqlite-utils 3.0 alpha, Git scraping in the zeitgeist by Simon Willison, posted on 7th November 2020.

Next: Personal Data Warehouses: Reclaiming Your Data

Previous: Datasette 0.51 (plus weeknotes)