Simon Willison’s Weblog

Datasette 0.58: The annotated release notes

I released Datasette 0.58 last night, with new plugin hooks, Unix domain socket support, a major faceting performance fix and a few other improvements. Here are the annotated release notes.

Faceting performance improvement

Facets remains my favourite feature in Datasette: it turns out a simple group by / count against a column is one of the most productive ways I know of to start understanding new data.

Yesterday I stumbled across a tiny tweak (details in this issue) that gave me a 10x performance boost on facet queries! Short version: given the following example query:

select
  country_long,
  count(*)
from (
  select * from [global-power-plants]
  order by rowid
)
where
  country_long is not null
group by
  country_long
order by
  count(*) desc

Removing the unnecessary order by rowid from that inner query knocked the time down from 53ms to 7.2ms (and makes even more of a difference on larger tables).

I was surprised SQLite didn’t perform that optimization automatically—so I started a thread on the SQLite forum and SQLite author D. Richard Hipp figured out a patch! It’s not yet certain that it will land in a SQLite release but I’m excited to have found an issue interesting enough to be worth looking into. (UPDATE: it landed on trunk).

The get_metadata() plugin hook

New plugin hook: get_metadata(datasette, key, database, table), for returning custom metadata for an instance, database or table. Thanks, Brandon Roberts! (#1384)

Brandon Roberts contributed this hook as part of work he’s been doing with Newsday nextLI—always exciting to see Datasette used by another news organization. Brandon has a live demo of the plugins he has been building: datasette-live-config, datasette-live-permissions, datasette-csv-importer and datasette-surveys. He also has a 6 minute demo video explaining the project so far.

The new hook allows plugins to customize the metadata displayed for different databases and tables within the Datasette interface.

There is one catch at the moment: the plugin doesn’t yet allow for async calls (including await db.execute(sql)) because Datasette’s own internals currently treat access to metadata as a sync rather than async feature.

There are workarounds for this. Brandon’s datasette-live-config plugin opens an additional, synchronous connection to the DB which is completely fine for fast queries. Another option would be to keep metadata in an in-memory Python dictionary which is updated by SQL queries that run in an async background task.

In the longer run though I’d like to redesign Datasette’s internals to support asynchronous metadata access—ideally before Datasette 1.0.

The skip_csrf() plugin hook

New plugin hook: skip_csrf(datasette, scope), for opting out of CSRF protection based on the incoming request. (#1377)

I wanted to write a plugin that supported an HTTP POST to a Datasette form that wasn’t protected by Datasette’s CSRF protection. This proved surprisingly difficult! I ended up shipping asgi-csrf 0.9 with a new mechanism for custom opting-out of CSRF protection based on the ASGI scope, then exposing that mechanism in a new plugin hook in Datasette.

CSRF is such a frustrating security issue to write code against, because in modern browsers the SameSite cookie attribute more-or-less solves the problem for you... but that attribute only has 90% global usage according to caniuse.com—not quite enough for me to forget about it entirely.

There also remains one obscure edge-case in which SameSite won’t help you: the definition of “same site” includes other subdomains of your domain (provided it’s not on the Public Suffix List). This means that for SameSite CSRF protection to work you need to be confident that no subdomains of your domain will suffer an XSS—and in my experience its common for subdomains to be pointed at third-party applications that may not have the same stringent XSS protection as your main code.

So I continue to care about CSRF protection in Datasette.

Unix domain socket support

New datasette --uds /tmp/datasette.sock option for binding Datasette to a Unix domain socket, see proxy documentation. (#1388)

I wrote about this in my weeknotes—this is a great way to run Datasette if you have it behind a proxy such as Apache or nginx and don’t want to have the Datasette server listening on a high port.

“searchmode”: “raw” in table metadata

"searchmode": "raw" table metadata option for defaulting a table to executing SQLite full-text search syntax without first escaping it, see Advanced SQLite search queries. (#1389)

SQLite’s built in full-text search feature includes support for advanced operators: you can use operators like AND, OR and NEAR and you can add column specifiers like name:Simon to restrict searches to individual columns.

This is something of a two-edged sword: I’ve found innocent looking queries that raise errors due to unexpected interactions with the query language.

In issue 651 I switched to escaping all queries by default to prevent these errors from happening, with a ?_searchmode=raw query string option for opting back into the default functionality.

I’ve since had a few requests for a mechanism to enable this by default—hence the new "searchmode": "raw" option in table metadata.

Link plugin hooks now take a request

The menu_links(), table_actions() and database_actions() plugin hooks all gained a new optional request argument providing access to the current request. (#1371)

I have a plugin which needs to add links to different places depending on the subdomain that the Datasette instance is running on. Adding request to these plugin hooks proved to be the easiest way to achieve this.

This is a really nice thing about how Pluggy (the plugin library used by Datasette) works: adding new named parameters to hooks can be done without breaking backwards compatibility with existing plugins.

And the rest

  • Improved documentation for Running Datasette behind a proxy to recommend using ProxyPreservehost On with Apache. (#1387)
  • POST requests to endpoints that do not support that HTTP verb now return a 405 error.
  • db.path can now be provided as a pathlib.Path object, useful when writing unit tests for plugins. Thanks, Chris Amico. (#1365)

This is Datasette 0.58: The annotated release notes by Simon Willison, posted on 16th July 2021.

Part of series Datasette: The annotated release notes

  1. Datasette 0.49: The annotated release notes - Sept. 15, 2020, 11:45 p.m.
  2. Datasette 0.50: The annotated release notes - Oct. 9, 2020, 8:23 p.m.
  3. Datasette 0.54: The annotated release notes - Jan. 25, 2021, 5:31 p.m.
  4. Datasette 0.58: The annotated release notes - July 16, 2021, 2:21 a.m.
  5. Datasette Desktop 0.2.0: The annotated release notes - Sept. 13, 2021, 11:30 p.m.

Next: It doesn't take much public creativity to stand out as a job candidate

Previous: Weeknotes: Fun with Unix domain sockets