Datasette 0.28—and why master should always be releasable
It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.
This isn’t because development has slowed down. In fact, the project has had 131 commits since then, covering a bewildering array of new functionality and with some significant contributions from developers who aren’t me—Russ Garrett and Romain Primet deserve special recognition here.
The problem has been one of discipline. I’m a big fan of the idea of keeping master shippable at all times in my professional work, but I hadn’t quite adopted this policy for my open-source side projects. A couple of months ago I found myself in a situation where I had two major refactorings (of faceting and of Datasette’s treatment of immutable files) going on in master at the same time, and untangling them turned out to take way longer than I had expected.
So I’ve updated Datasette’s contribution guidelines to specify that master should always be releasable, almost entirely as a reminder to myself.
Supporting databases that change
From the beginning of the project, Datasette has been designed with read-only databases in mind. If a database is guaranteed not to change it opens up all kinds of interesting opportunities—from taking advantage of SQLite immutable mode and HTTP caching to bundling static copies of the database directly in a Docker container. The interesting ideas in Datasette explores this idea in detail.
As my goals for the project have developed, I realized that read-only databases are no longer the right default. SQLite actually supports concurrent access very well provided only one thread attempts to write to a database at a time, and I keep encountering sensible use-cases for running Datasette on top of a database that is processing inserts and updates.
So, as-of version 0.28 Datasette no longer assumes that a database file will not change. It is now safe to point Datasette at a SQLite database which is being updated by another process.
Making this change was a lot of work—see tracking tickets #418, #419 and #420. It required new thinking around how Datasette should calculate table counts (an expensive operation against a large, changing database) and also meant reconsidering the “content hash” URLs Datasette has used in the past to optimize the performance of HTTP caches.
Datasette can still run against immutable files and gains numerous performance benefits from doing so, but this is no longer the default behaviour. Take a look at the new Performance and caching documentation section for details on how to make the most of Datasette against data that you know will be staying read-only and immutable.
Faceting improvements, and faceting plugins
Datasette Facets provide an intuitive way to quickly summarize and interact with data. Previously the only supported faceting technique was column faceting, but 0.28 introduces two powerful new capibilities: facet-by-JSON-array and the ability to define further facet types using plugins.
Facet by array (#359) is only available if your SQLite installation provides the
json1 extension. Datasette will automatically detect columns that contain JSON arrays of values and offer a faceting interface against those columns—useful for modelling things like tags without needing to break them out into a new table. See Facet by JSON array for more.
The new register_facet_classes() plugin hook (#445) can be used to register additional custom facet classes. Each facet class should provide two methods:
suggest() which suggests facet selections that might be appropriate for a provided SQL query, and
facet_results() which executes a facet operation and returns results. Datasette’s own faceting implementations have been refactored to use the same API as these plugins.
datasette publish cloudrun
Google Cloud Run is a brand new serverless hosting platform from Google, which allows you to build a Docker container which will run only when HTTP traffic is recieved and will shut down (and hence cost you nothing) the rest of the time. It’s similar to Zeit’s Now v1 Docker hosting platform which sadly is no longer accepting signups from new users.
datasette publish cloudrun command was contributed by Romain Primet (#434) and publishes selected databases to a new Datasette instance running on Google Cloud Run.
See Publishing to Google Cloud Run for full documentation.
Russ Garrett implemented a new Datasette plugin hook called register_output_renderer (#441) which allows plugins to create additional output renderers in addition to Datasette’s default
- Datasette now conforms to the Black coding style (#449)—and has a unit test to enforce this in the future
- New Special table arguments:
?columnname__in=value1,value2,value3filter for executing SQL IN queries against a table, see Table arguments (#433)
?columnname__date=yyyy-mm-ddfilter which returns rows where the spoecified datetime column falls on the specified date (583b22a)
?tags__arraycontains=tagfilter which acts against a JSON array contained in a column (78e45ea)
?_where=sql-fragmentfilter for the table view (#429)
?_fts_pk=mycolumnquerystring options can be used to specify which FTS table to use for a search query—see Configuring full-text search for a table or view (#428)
- You can now pass the same table filter multiple times—for example,
?content__not=world&content__not=hellowill return all rows where the content column is neither
- You can now specify
about_urlmetadata (in addition to
license) linking to further information about a project—see Source, license and about
?_trace=1parameter now adds debug information showing every SQL query that was executed while constructing the page (#435)
datasette inspectnow just calculates table counts, and does not introspect other database metadata (#462)
/-/inspectpage entirely—this will be replaced by something similar in the future, see #465
- Datasette can now run against an in-memory SQLite database. You can do this by starting it without passing any files or by using the new
datasette serve. This can be useful for experimenting with SQLite queries that do not access any data, such as
- We now show the size of the database file next to the download link (#172)
/-/databasesintrospection page shows currently connected databases (#470)
- Binary data is no longer displayed on the table and row pages (#442—thanks, Russ Garrett)
- New show/hide SQL links on custom query pages (#415)
- The extra_body_script plugin hook now accepts an optional
view_nameargument (#443—thanks, Russ Garrett)
- Bumped Jinja2 dependency to 2.10.1 (#426)
- All table filters are now documented, and documentation is enforced via unit tests (2c19a27)
- New project guideline: master should stay shippable at all times! (31f36e1)
- Fixed a bug where
sqlite_timelimit()occasionally failed to clean up after itself (bac4e01)
- We no longer load additional plugins when executing pytest (#438)
- Homepage now links to database views if there are less than five tables in a database (#373)
--corsoption is now respected by error pages (#453)
datasette publish herokunow uses the
--include-vcs-ignoreoption, which means it works under Travis CI (#407)
datasette publish herokunow publishes using Python 3.6.8 (666c374)
datasette publish nowto
datasette publish nowv1(#472)
datasette publish nowv1now accepts multiple
- Removed the
datasette skeletoncommand (#476)
- The documentation on how to build the documentation now recommends