Simon Willison’s Weblog

Subscribe

Datasette 0.28—and why master should always be releasable

19th May 2019

It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.

This isn’t because development has slowed down. In fact, the project has had 131 commits since then, covering a bewildering array of new functionality and with some significant contributions from developers who aren’t me—Russ Garrett and Romain Primet deserve special recognition here.

The problem has been one of discipline. I’m a big fan of the idea of keeping master shippable at all times in my professional work, but I hadn’t quite adopted this policy for my open-source side projects. A couple of months ago I found myself in a situation where I had two major refactorings (of faceting and of Datasette’s treatment of immutable files) going on in master at the same time, and untangling them turned out to take way longer than I had expected.

So I’ve updated Datasette’s contribution guidelines to specify that master should always be releasable, almost entirely as a reminder to myself.

All of that said, I’m finally back out of the weeds and I’m excited to announce today’s release of Datasette 0.28. It features a salmagundi of new features! I’m replicating the release notes below.

Supporting databases that change

From the beginning of the project, Datasette has been designed with read-only databases in mind. If a database is guaranteed not to change it opens up all kinds of interesting opportunities—from taking advantage of SQLite immutable mode and HTTP caching to bundling static copies of the database directly in a Docker container. The interesting ideas in Datasette explores this idea in detail.

As my goals for the project have developed, I realized that read-only databases are no longer the right default. SQLite actually supports concurrent access very well provided only one thread attempts to write to a database at a time, and I keep encountering sensible use-cases for running Datasette on top of a database that is processing inserts and updates.

So, as-of version 0.28 Datasette no longer assumes that a database file will not change. It is now safe to point Datasette at a SQLite database which is being updated by another process.

Making this change was a lot of work—see tracking tickets #418, #419 and #420. It required new thinking around how Datasette should calculate table counts (an expensive operation against a large, changing database) and also meant reconsidering the “content hash” URLs Datasette has used in the past to optimize the performance of HTTP caches.

Datasette can still run against immutable files and gains numerous performance benefits from doing so, but this is no longer the default behaviour. Take a look at the new Performance and caching documentation section for details on how to make the most of Datasette against data that you know will be staying read-only and immutable.

Faceting improvements, and faceting plugins

Datasette Facets provide an intuitive way to quickly summarize and interact with data. Previously the only supported faceting technique was column faceting, but 0.28 introduces two powerful new capibilities: facet-by-JSON-array and the ability to define further facet types using plugins.

Facet by array (#359) is only available if your SQLite installation provides the json1 extension. Datasette will automatically detect columns that contain JSON arrays of values and offer a faceting interface against those columns—useful for modelling things like tags without needing to break them out into a new table. See Facet by JSON array for more.

The new register_facet_classes() plugin hook (#445) can be used to register additional custom facet classes. Each facet class should provide two methods: suggest() which suggests facet selections that might be appropriate for a provided SQL query, and facet_results() which executes a facet operation and returns results. Datasette’s own faceting implementations have been refactored to use the same API as these plugins.

datasette publish cloudrun

Google Cloud Run is a brand new serverless hosting platform from Google, which allows you to build a Docker container which will run only when HTTP traffic is recieved and will shut down (and hence cost you nothing) the rest of the time. It’s similar to Zeit’s Now v1 Docker hosting platform which sadly is no longer accepting signups from new users.

The new datasette publish cloudrun command was contributed by Romain Primet (#434) and publishes selected databases to a new Datasette instance running on Google Cloud Run.

See Publishing to Google Cloud Run for full documentation.

register_output_renderer plugins

Russ Garrett implemented a new Datasette plugin hook called register_output_renderer (#441) which allows plugins to create additional output renderers in addition to Datasette’s default .json and .csv.

Russ’s in-development datasette-geo plugin includes an example of this hook being used to output .geojson automatically converted from SpatiaLite.

Medium changes

  • Datasette now conforms to the Black coding style (#449)—and has a unit test to enforce this in the future
  • New Special table arguments:
    • ?columnname__in=value1,value2,value3 filter for executing SQL IN queries against a table, see Table arguments (#433)
    • ?columnname__date=yyyy-mm-dd filter which returns rows where the spoecified datetime column falls on the specified date (583b22a)
    • ?tags__arraycontains=tag filter which acts against a JSON array contained in a column (78e45ea)
    • ?_where=sql-fragment filter for the table view (#429)
    • ?_fts_table=mytable and ?_fts_pk=mycolumn querystring options can be used to specify which FTS table to use for a search query—see Configuring full-text search for a table or view (#428)
  • You can now pass the same table filter multiple times—for example, ?content__not=world&content__not=hello will return all rows where the content column is neither hello or world (#288)
  • You can now specify about and about_url metadata (in addition to source and license) linking to further information about a project—see Source, license and about
  • New ?_trace=1 parameter now adds debug information showing every SQL query that was executed while constructing the page (#435)
  • datasette inspect now just calculates table counts, and does not introspect other database metadata (#462)
  • Removed /-/inspect page entirely—this will be replaced by something similar in the future, see #465
  • Datasette can now run against an in-memory SQLite database. You can do this by starting it without passing any files or by using the new --memory option to datasette serve. This can be useful for experimenting with SQLite queries that do not access any data, such as SELECT 1+1 or SELECT sqlite_version().

Small changes

  • We now show the size of the database file next to the download link (#172)
  • New /-/databases introspection page shows currently connected databases (#470)
  • Binary data is no longer displayed on the table and row pages (#442—thanks, Russ Garrett)
  • New show/hide SQL links on custom query pages (#415)
  • The extra_body_script plugin hook now accepts an optional view_name argument (#443—thanks, Russ Garrett)
  • Bumped Jinja2 dependency to 2.10.1 (#426)
  • All table filters are now documented, and documentation is enforced via unit tests (2c19a27)
  • New project guideline: master should stay shippable at all times! (31f36e1)
  • Fixed a bug where sqlite_timelimit() occasionally failed to clean up after itself (bac4e01)
  • We no longer load additional plugins when executing pytest (#438)
  • Homepage now links to database views if there are less than five tables in a database (#373)
  • The --cors option is now respected by error pages (#453)
  • datasette publish heroku now uses the --include-vcs-ignore option, which means it works under Travis CI (#407)
  • datasette publish heroku now publishes using Python 3.6.8 (666c374)
  • Renamed datasette publish now to datasette publish nowv1 (#472)
  • datasette publish nowv1 now accepts multiple --alias parameters (09ef305)
  • Removed the datasette skeleton command (#476)
  • The documentation on how to build the documentation now recommends sphinx-autobuild

This is Datasette 0.28—and why master should always be releasable by Simon Willison, posted on 19th May 2019.

Next: Porting Datasette to ASGI, and Turtles all the way down

Previous: Running Datasette on Glitch