Datasette 0.58: The annotated release notes
Faceting performance improvement
Facets remains my favourite feature in Datasette: it turns out a simple group by / count against a column is one of the most productive ways I know of to start understanding new data.
select country_long, count(*) from ( select * from [global-power-plants] order by rowid ) where country_long is not null group by country_long order by count(*) desc
Removing the unnecessary
order by rowid from that inner query knocked the time down from 53ms to 7.2ms (and makes even more of a difference on larger tables).
I was surprised SQLite didn’t perform that optimization automatically—so I started a thread on the SQLite forum and SQLite author D. Richard Hipp figured out a patch! It’s not yet certain that it will land in a SQLite release but I’m excited to have found an issue interesting enough to be worth looking into. (UPDATE: it landed on trunk).
The get_metadata() plugin hook
Brandon Roberts contributed this hook as part of work he’s been doing with Newsday nextLI—always exciting to see Datasette used by another news organization. Brandon has a live demo of the plugins he has been building: datasette-live-config, datasette-live-permissions, datasette-csv-importer and datasette-surveys. He also has a 6 minute demo video explaining the project so far.
The new hook allows plugins to customize the metadata displayed for different databases and tables within the Datasette interface.
There is one catch at the moment: the plugin doesn’t yet allow for async calls (including
await db.execute(sql)) because Datasette’s own internals currently treat access to metadata as a sync rather than async feature.
There are workarounds for this. Brandon’s
datasette-live-config plugin opens an additional, synchronous connection to the DB which is completely fine for fast queries. Another option would be to keep metadata in an in-memory Python dictionary which is updated by SQL queries that run in an async background task.
In the longer run though I’d like to redesign Datasette’s internals to support asynchronous metadata access—ideally before Datasette 1.0.
The skip_csrf() plugin hook
I wanted to write a plugin that supported an HTTP POST to a Datasette form that wasn’t protected by Datasette’s CSRF protection. This proved surprisingly difficult! I ended up shipping asgi-csrf 0.9 with a new mechanism for custom opting-out of CSRF protection based on the ASGI scope, then exposing that mechanism in a new plugin hook in Datasette.
CSRF is such a frustrating security issue to write code against, because in modern browsers the SameSite cookie attribute more-or-less solves the problem for you... but that attribute only has 90% global usage according to caniuse.com—not quite enough for me to forget about it entirely.
There also remains one obscure edge-case in which SameSite won’t help you: the definition of “same site” includes other subdomains of your domain (provided it’s not on the Public Suffix List). This means that for SameSite CSRF protection to work you need to be confident that no subdomains of your domain will suffer an XSS—and in my experience its common for subdomains to be pointed at third-party applications that may not have the same stringent XSS protection as your main code.
So I continue to care about CSRF protection in Datasette.
Unix domain socket support
I wrote about this in my weeknotes—this is a great way to run Datasette if you have it behind a proxy such as Apache or nginx and don’t want to have the Datasette server listening on a high port.
“searchmode”: “raw” in table metadata
SQLite’s built in full-text search feature includes support for advanced operators: you can use operators like AND, OR and NEAR and you can add column specifiers like
name:Simon to restrict searches to individual columns.
This is something of a two-edged sword: I’ve found innocent looking queries that raise errors due to unexpected interactions with the query language.
In issue 651 I switched to escaping all queries by default to prevent these errors from happening, with a
?_searchmode=raw query string option for opting back into the default functionality.
I’ve since had a few requests for a mechanism to enable this by default—hence the new
"searchmode": "raw" option in table metadata.
Link plugin hooks now take a request
I have a plugin which needs to add links to different places depending on the subdomain that the Datasette instance is running on. Adding
request to these plugin hooks proved to be the easiest way to achieve this.
This is a really nice thing about how Pluggy (the plugin library used by Datasette) works: adding new named parameters to hooks can be done without breaking backwards compatibility with existing plugins.
And the rest
- Improved documentation for Running Datasette behind a proxy to recommend using
ProxyPreservehost Onwith Apache. (#1387)
POSTrequests to endpoints that do not support that HTTP verb now return a 405 error.
db.pathcan now be provided as a
pathlib.Pathobject, useful when writing unit tests for plugins. Thanks, Chris Amico. (#1365)