Datasette 0.54: The annotated release notes
Datasette 0.54 is out today, with a number of foundational new features. As I’ve done for previous releases, here are the annotated release notes with some additional commentary.
The _internal database
As part of ongoing work to help Datasette handle much larger numbers of connected databases and tables (see Datasette Library) Datasette now maintains an in-memory SQLite database with details of all of the attached databases, tables, columns, indexes and foreign keys. (#1150)
This will support future improvements such as a searchable, paginated homepage of all available tables.
The Datasette Library issue has been open for nearly two years now. It’s a need I identified at the NICAR 2019 data journalism conference, where it became apparent that many newsrooms are sat on an enormous pile of data that they have collected but without any central place to keep it all.
Large tech companies usually have some kind of data warehouse or data lake which is meant to address this problem. These are complicated, extremely expensive to run and the ability to add and maintain data in them is often reserved for a high technology priesthood.
I see Datasette as targeting the “small data” end of this equation, where I define small data as anything that will fit on my phone. My iPhone has half a terabyte of storage now!
How many more people and organizations would run their own data warehouse if it could sit on their laptop or run on a $5/month VPS?
You can explore an example of this database by signing in as root to the
latest.datasette.iodemo instance and then navigating to latest.datasette.io/_internal.
Every commit to Datasette deploys a new demo instance to
latest.datasette.io—here’s the GitHub Actions workflow. This is useful for all sorts of things—I can use it to test additional devices, remind myself how different pages of Datasette work and in this case I can link to it directly from the documentation.
Plugins can use these tables to introspect attached data in an efficient way. Plugin authors should note that this is not yet considered a stable interface, so any plugins that use this may need to make changes prior to Datasette 1.0 if the
_internaltable schemas change.
In the ongoing press towards Datasette 1.0 I’ve decided to start shipping features like this earlier, but with prominent “this is not yet stable” notices to warn people that the details might change. I think this is a better way to preview new features than having them live in a branch or alpha release.
Once I hit 1.0 I’m going to only ship features that are genuinely stable, to avoid having to make any major version number bumps for as long as possible.
Named in-memory database support
As part of the work building the
_internaldatabase, Datasette now supports named in-memory databases that can be shared across multiple connections. This allows plugins to create in-memory databases which will persist data for the lifetime of the Datasette server process. (#1151)
memory_name=parameter to the Database class can be used to create named, shared in-memory databases.
This exposes a relatively obscure feature of SQLite: it allows in-memory databases to operate with a shared cache, which means multiple database connections can see the same in-memory data.
I can think of so many fun plugin use-cases for this!
- A plugin that accepts a URL to a CSV file, then pulls that data into an in-memory table and presents a standard Datasette table interface for interacting with it
- An API rate-limiting plugin that uses an in-memory database as a persistent store
- Any plugin that might need to cache something can now do so with its own in-memory database
I haven’t used this capability in a plugin myself yet but I’m excited to try it out with something soon.
<script>tags with a
Datasette now has the ability to output
<script type="module">in places where you may wish to take advantage of modules. The
I now have three plugins that load three different versions of Leaflet: datasette-cluster-map, datasette-leaflet-geojson and the new datasette-leaflet-freedraw.
datasette-leaflet so that they can share the same library version.
Code formatting with Black and Prettier
I love opinionated code formatting tools. After adopting Black it started to dawn on my quite how much of my programming brain cycles had been spent thinking about indentation—sweating over the details (even in Python) of when to linebreak, how to format complex function definitions and so on.
Switching to Black gave me all of that thinking time back to spend on other things. It even gave me a more productive way to write unit tests.
As the positive evidence from using Black continue to build, I realized that the benefits of never having to think about formatting again outweighed any tiny disagreements I might have about aesthetics. So I’m all on board with Prettier now!
The rest of the release is mostly small bug fixes.
There are some documentation improvements too. I added a new section to the page about writing tests for plugins on testing outbound HTTP calls with pytest-httpx—any time I find myself having to solve the same problem in more than one plugin is a hint that it should probably be covered by the documentation.
As always, I welcome conversations about the new release (and Datasette in general) on the Datasette discussions forum. I’m also continuing to run Datasette Office Hours every Friday, so if you’re interested in a 1-1 conversation about the project please grab a slot!
More recent articles
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023