Weeknotes: Datasette Writes
26th February 2020
As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.
Datasette was born as a hack to abuse serverless, stateless hosting by bundling a static, immutable database as part of the deployment. The key idea was that for some use-cases—such as data journalism—you don’t need to be able to continually update your data. It’s just the facts that support the story you are trying to tell.
I also believed the conventional wisdom that SQLite is fine for reads but shouldn’t be trusted to handle web application writes. I no longer believe this to be the case: SQLite is great at handling writes, as millions of iPhone and Android apps will attest.
Meanwhile, the biggest blocker to people trying out Datasette is that they would need to convert their data to SQLite somehow in order to use it. I’ve been building a family of CLI tools for this, but that requires users to both be familiar with the command-line and to install software on their computers.
So: Datasette needs to grow web-based tools for loading data into the database.
Datasette’s plugin system is the ideal space for experimenting with ways of doing this, without needing to try out crazy new features on Datasette’s own core.
There’s just one big problem: SQLite may be great at fast, reliable writes but it still doesn’t like concurrent writes: it’s important to only ever have one connection writing to a SQLite database at a time.
I’ve been mulling over the best way to handle this for the best part of a year... and then a couple of days ago I had a breakthrough: with a dedicated write thread for a database file, I could use a Python queue to ensure only one write could access the database at a time.
There’s prior art for this: SQLite wizard Charles Leifer released code plus a beautiful explanation of how to queue writes to SQLite back in 2017. I’m not sure why I didn’t settle on his approach sooner.
So... Datasette 0.37, released this evening, has a new capability exposed to plugins: they can now request that an operation (either a SQL statement or a full custom Python function) be queued up to execute inside a thread that posesses an exclusive write connection to a SQLite database.
I’ve documented how plugins can use this in the new plugin internals documentation: execute_write() and execute_write_fn().
So far there’s only one public plugin that takes advantage of this: datasette-upload-csvs, which previously used a dirty hack but has now been upgraded to use the new execute_write_fn()
method.
I’m really excited about the potential plugins this unlocks though. I experimented with a logging plugin and a plugin for deleting tables while I was building the hooks (full implementations of those are posted as comments in the pull request).
Other use-cases I’m interested to explore include:
- Plugins that import data from other APIs or services. Imagine web UI interfaces to some of my Dogsheep tools for example.
- Plugins that periodically update data—pulling the latest CSV updates from government open data portals (like San Francisco’s trees).
- Tools for enhancing tables with additional data derived from their values—geocoding or reverse geocoding columns, resolving identifiers and so on.
- Now that plugins have a tool for maintaining their own state, plugins could use SQLite tables to track things like which saved searches have been executed.
- A plugin that lets you attach annotations to rows and columns in other tables, storing those annotations in its own SQLite database.
More recent articles
- llm cmd undo last git commit - a new plugin for LLM - 26th March 2024
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter - 23rd March 2024
- Claude and ChatGPT for ad-hoc sidequests - 22nd March 2024
- Weeknotes: the aftermath of NICAR - 16th March 2024
- The GPT-4 barrier has finally been broken - 8th March 2024
- Prompt injection and jailbreaking are not the same thing - 5th March 2024
- Interesting ideas in Observable Framework - 3rd March 2024
- Weeknotes: Getting ready for NICAR - 27th February 2024
- The killer app of Gemini Pro 1.5 is video - 21st February 2024
- Weeknotes: a Datasette release, an LLM release and a bunch of new plugins - 9th February 2024