Simon Willison’s Weblog

Subscribe

Datasette 0.54: The annotated release notes

25th January 2021

Datasette 0.54 is out today, with a number of foundational new features. As I’ve done for previous releases, here are the annotated release notes with some additional commentary.

The _internal database

As part of ongoing work to help Datasette handle much larger numbers of connected databases and tables (see Datasette Library) Datasette now maintains an in-memory SQLite database with details of all of the attached databases, tables, columns, indexes and foreign keys. (#1150)

This will support future improvements such as a searchable, paginated homepage of all available tables.

The Datasette Library issue has been open for nearly two years now. It’s a need I identified at the NICAR 2019 data journalism conference, where it became apparent that many newsrooms are sat on an enormous pile of data that they have collected but without any central place to keep it all.

Large tech companies usually have some kind of data warehouse or data lake which is meant to address this problem. These are complicated, extremely expensive to run and the ability to add and maintain data in them is often reserved for a high technology priesthood.

I see Datasette as targeting the “small data” end of this equation, where I define small data as anything that will fit on my phone. My iPhone has half a terabyte of storage now!

How many more people and organizations would run their own data warehouse if it could sit on their laptop or run on a $5/month VPS?

You can explore an example of this database by signing in as root to the latest.datasette.io demo instance and then navigating to latest.datasette.io/_internal.

Every commit to Datasette deploys a new demo instance to latest.datasette.io—here’s the GitHub Actions workflow. This is useful for all sorts of things—I can use it to test additional devices, remind myself how different pages of Datasette work and in this case I can link to it directly from the documentation.

Plugins can use these tables to introspect attached data in an efficient way. Plugin authors should note that this is not yet considered a stable interface, so any plugins that use this may need to make changes prior to Datasette 1.0 if the _internal table schemas change.

In the ongoing press towards Datasette 1.0 I’ve decided to start shipping features like this earlier, but with prominent “this is not yet stable” notices to warn people that the details might change. I think this is a better way to preview new features than having them live in a branch or alpha release.

Once I hit 1.0 I’m going to only ship features that are genuinely stable, to avoid having to make any major version number bumps for as long as possible.

Named in-memory database support

As part of the work building the _internal database, Datasette now supports named in-memory databases that can be shared across multiple connections. This allows plugins to create in-memory databases which will persist data for the lifetime of the Datasette server process. (#1151)

The new memory_name= parameter to the Database class can be used to create named, shared in-memory databases.

This exposes a relatively obscure feature of SQLite: it allows in-memory databases to operate with a shared cache, which means multiple database connections can see the same in-memory data.

I can think of so many fun plugin use-cases for this!

  • A plugin that accepts a URL to a CSV file, then pulls that data into an in-memory table and presents a standard Datasette table interface for interacting with it
  • An API rate-limiting plugin that uses an in-memory database as a persistent store
  • Any plugin that might need to cache something can now do so with its own in-memory database

I haven’t used this capability in a plugin myself yet but I’m excited to try it out with something soon.

JavaScript module support

JavaScript modules were introduced in ECMAScript 2015 and provide native browser support for the import and export keywords.

To use modules, JavaScript needs to be included in <script> tags with a type="module" attribute.

Datasette now has the ability to output <script type="module"> in places where you may wish to take advantage of modules. The extra_js_urls option described in Custom CSS and JavaScript can now be used with modules, and module support is also available for the extra_body_script() plugin hook. (#1186, #1187)

This feature was inspired by discussion on this issue about a JavaScript plugin system for Datasette.

I’ve been getting excited about JavaScript modules recently, partly thanks to Snowpack and Skypack. I enjoyed maintainer Fred K. Schott’s GitHub OCTO speaker series talk last week.

The key feature of modules that is useful for Datasette is that they solve the “double imports” problem. Datasette plugins often need to load additional JavaScript libraries, such as Leaflet or React. Using modules means those plugins can, without any knowledge of each other, avoid accidentally loading the same code twice.

datasette-leaflet-freedraw is the first example of a Datasette plugin that takes advantage of the new support for JavaScript modules. See Drawing shapes on a map to query a SpatiaLite database for more on this plugin.

I now have three plugins that load three different versions of Leaflet: datasette-cluster-map, datasette-leaflet-geojson and the new datasette-leaflet-freedraw.

I’ve started work on a new plugin called datasette-leaflet which will make Leaflet available as a JavaScript module. Those other three plugins can then all depend on datasette-leaflet so that they can share the same library version.

Code formatting with Black and Prettier

Datasette adopted Black for opinionated Python code formatting in June 2019. Datasette now also embraces Prettier for JavaScript formatting, which like Black is enforced by tests in continuous integration. Instructions for using these two tools can be found in the new section on Code formatting in the contributors documentation. (#1167)

I love opinionated code formatting tools. After adopting Black it started to dawn on my quite how much of my programming brain cycles had been spent thinking about indentation—sweating over the details (even in Python) of when to linebreak, how to format complex function definitions and so on.

Switching to Black gave me all of that thinking time back to spend on other things. It even gave me a more productive way to write unit tests.

I’ve been wanting to do the same thing for JavaScript for years, but there were tiny aspects of the formatting style used by Prettier that I didn’t like—specifically the way it formats chained method calls with newlines.

As the positive evidence from using Black continue to build, I realized that the benefits of never having to think about formatting again outweighed any tiny disagreements I might have about aesthetics. So I’m all on board with Prettier now!

Everything else

The rest of the release is mostly small bug fixes.

There are some documentation improvements too. I added a new section to the page about writing tests for plugins on testing outbound HTTP calls with pytest-httpx—any time I find myself having to solve the same problem in more than one plugin is a hint that it should probably be covered by the documentation.

As always, I welcome conversations about the new release (and Datasette in general) on the Datasette discussions forum. I’m also continuing to run Datasette Office Hours every Friday, so if you’re interested in a 1-1 conversation about the project please grab a slot!