Weeknotes: Datasette, sqlite-utils, Datasette Desktop
20th July 2022
A flurry of releases this week, including a new Datasette alpha and a fixed Datasette Desktop.
Work on Datasette Cloud continues—the last 10% of the work needed for the beta launch is trending towards taking 90% of the time. It’s been driving all sorts of fixes to the wider Datasette ecosystem, which is nice.
I ran into a bug which would have been easier to investigate using Sentry. The datasette-sentry plugin wasn’t working right, and it turned out I needed a new handle_exception() plugin hook to fix it. This was the impetus I needed to push out a new Datasette alpha—I like to get new hooks into an alpha as quickly as possible so I can depend on that Datasette version from alpha releases of plugins.
Here are some other highlights from the alpha’s release notes:
A neat thing about Pluggy is that you can add new arguments to existing plugin hooks without breaking plugins that already use them.
--nolockoption for ignoring file locks when opening read-only databases. (#1744)
Since the very start of the project Datasette has suggested trying the following command to start exploring your Google Chrome history, which is stored using SQLite:
datasette ~/Library/Application\ Support/Google/Chrome/Default/History
I’m not sure when this changed, but I tried running the command recently and got the following error:
sqlite3.OperationalError: database is locked
Since Datasette opens databases in read-only mode I didn’t see why a lock like this should be respected. It turns out SQLite can be told to ignore locks like so:
sqlite3.connect( "file:places.sqlite?mode=ro&nolock=1", uri=True )
So I added a
--nolock option to Datasette which does exactly that:
datasette ~/Library/Application\ Support/Google/Chrome/Default/History --nolock
- Datasette now has a Discord community.
Inspired by 6 More Things I Learned Building Snowpack to 20,000 Stars (Part 2) by Fred K. Schott I finally setup a chat community for Datasette, using Discord.
- Database file downloads now implement conditional GET using ETags. (#1739)
This is a change I made to support Datasette Lite—I noticed that the WASM version of Datasette was downloading a fresh database every time, so I added ETag support to encourage browsers to avoid a duplicate download and use a cached copy of the database file instead, provided it hasn’t changed.
I finally shipped that in Datasette Desktop 0.2.2, but I wanted to reduce the chances of this happening again as much as possible. Datasette Desktop’s Elecron tests used the spectron test harness, but that’s marked as deprecated.
I’m a big fan of Playwright and I was optimistic to see that it has support for testing Electron apps. I figured out how to use that with Datasette Desktop and run the tests in GitHub Actions: I wrote up what I learned in a TIL, Testing Electron apps with Playwright and GitHub Actions.
Annotated release notes:
davidleejy suggested the
table.duplicate() method and contributed an implementation. This was the impetus for pushing out a fresh release.
I added the CLI equivalent,
sqlite_utils.utils.rows_from_file()is now a documented API. It can be used to read a sequence of dictionaries from a file-like object containing CSV, TSV, JSON or newline-delimited JSON. It can be passed an explicit format or can attempt to detect the format automatically. (#443)
sqlite_utils.utils.TypeTrackeris now a documented API for detecting the likely column types for a sequence of string rows, see Detecting column types using TypeTracker. (#445)
sqlite_utils.utils.chunks()is now a documented API for splitting an iterator into chunks. (#451)
I have a policy that any time I need to use an undocumented method from
sqlite-utils in some other project I file an issue to add that to the documented API surface.
I had used
TypeTracker in datasette-socrata.
This was inspired by my TIL Ignoring errors in a section of a Bash script—a trick I had to figure out because one of my scripts needed to add columns and enable FTS but only if those changes had not been previously applied.
In looking into that I spotted inconsistencies in the design of the
sqlite-utils commands, so I fixed those as much as I could while still maintaining backwards compatibility with the 3.x releases.
Releases this week
s3-ocr: 0.5—(5 releases total)—2022-07-19
Tools for running OCR against files stored in S3
datasette-graphql: 2.1.1—(36 releases total)—2022-07-18
Datasette plugin providing an automatic GraphQL API for your SQLite databases
datasette-sentry: 0.2a0—(3 releases total)—2022-07-18
Datasette plugin for configuring Sentry
datasette: 0.62a1—(112 releases total)—2022-07-18
An open source multi-tool for exploring and publishing data
sqlite-utils: 3.28—(102 releases total)—2022-07-15
Python CLI utility and library for manipulating SQLite databases
datasette-publish-vercel: 0.14—(21 releases total)—2022-07-13
Datasette plugin for publishing data using Vercel
datasette-app: 0.2.2—(4 releases total)—2022-07-13
The Datasette macOS application
datasette-app-support: 0.11.6—(19 releases total)—2022-07-12
Part of https://github.com/simonw/datasette-app
TIL this week
More recent articles
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023