Simon Willison’s Weblog

Subscribe

Datasette 0.51 (plus weeknotes)

1st November 2020

I shipped Datasette 0.51 today, with a new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. It’s a lot of stuff! Here are the annotated release notes.

New visual design

Datasette is no longer white and grey with blue and purple links! Natalie Downe has been working on a visual refresh, the first iteration of which is included in this release. (#1056)

It’s about time Datasette grew beyond its clearly-designed-by-a-mostly-backend-engineer roots. Natalie has been helping me start adding some visual polish: we’ve started with an update to the colour scheme and will be continuing to iterate on the visual design as the project evolves towards the 1.0 release.

The new design makes the navigation bar much more obvious, which is important for this release since the new navigation menu (tucked away behind a three-bar icon) is a key new feature.

A screenshot of the new Datasette visual design, also showcasing two new menu icons

A number of existing Datasette plugins add new pages to the Datasette interface, providig tools for things like uploading CSVs, editing table schemas or configuring full-text search.

Plugins like this can now link to themselves from other parts of Datasette interface. The menu_links(datasette, actor) hook (#1064) lets plugins add links to Datasette’s new top-right application menu, and the table_actions(datasette, actor, database, table) hook (#1066) adds links to a new “table actions” menu on the table page.

This feature has been a long time coming. I’ve been writing an increasing number of plugins that add new pages to Datasette, and so far the main way of using them has been to memorise and type in their URLs!

The new navigation menu (which only displays if it has something in it) provides a global location to add new links. I’ve already released several plugin updates that take advantage of this.

The new “table actions” menu imitates Datasette’s existing column header menu icon—it’s a cog. Clicking it opens a menu of actions relating to the current table.

Want to see a demo?

The demo at latest.datasette.io now includes some example plugins. To see the new table actions menu first sign into that demo as root and then visit the facetable table to see the new cog icon menu at the top of the page.

Here’s an animated GIF demo showing the new menus in action.

Using the nav menu to upload CSVs, then the table actions menu to edit the table schema and rename latitude and longitude columns so that the Central Park Squirrel Census is displayed on a map!

Binary data

SQLite tables can contain binary data in BLOB columns. Datasette now provides links for users to download this data directly from Datasette, and uses those links to make binary data available from CSV exports. See Binary data for more details. (#1036 and #1034).

I spent a ton of time on this over the past few weeks. The initial impetus was a realization that Datasette CSV exports included ugly Python b'\x15\x1c\x02\xc7\xad\x05\xfe' strings, which felt like the worst possible way to display binary in a CSV file, out of universally bad options.

Datasette’s main interface punted on binary entirely—it would show a <Binary data: 7 bytes> label which didn’t help much either.

The only way to get at binary data stored in a Datasette instance was to request the JSON version and then manually decode the Base-64 value within it!

This is now fixed: binary columns can be downloaded directly to your computer, using a new .blob output renderer. The approach is described on this new page in the documentation.

Security was a major consideration when building this feature. Allowing the download of arbitrary byte payloads from a web server is dangerous business: it can easily result in XSS holes where HTML with dangerous <script> content can end up hosted on the primary domain.

After some research, I decided to serve up binary content for download using the following headings:

content-type: application/binary
x-content-type-options: nosniff
content-disposition: attachment; filename="data-f30889.blob"

application/binary is a safer Content-Type option than the more common application/octet-stream, according to Michal Zalewski’s renowned web application security book The Tangled Web (quoted here)

x-content-type-options: nosniff disables the XSS-tastic content sniffing feature in older versions of Internet Explorer, where IE would helpfully guess that you intended to serve HTML based on the first few bytes of the response.

The content-disposition: attachment header causes the browser to show a “download this file” dialog, using the suggested filename.

If you know of a reason that this isn’t secure enough, please let me know!

URL building

The new datasette.urls family of methods can be used to generate URLs to key pages within the Datasette interface, both within custom templates and Datasette plugins. See Building URLs within plugins for more details. (#904)

Datasette’s base_url configuration setting was the forcing factor around this piece of work.

It allows you to configure Datasette to serve content starting at a path other than /—for example:

datasette --config base_url:/path-to-datasette/

This will serve all Datasette pages at locations starting with /path-to-datasette/.

Why would you want to do this? It’s useful if you are proxying traffic to Datasette from within the URL hierarchy of an existing website.

The feature didn’t work properly, and enough people care about it that I had a steady stream of bug reports. For 0.51 I gathered them all into a single giant tracking issue and worked through them all one by one.

It quickly became apparent that the key challenge was building URLs within Datasette—not just within HTML template pages, but also for things like HTTP redirects.

Datasette itself needed to generate URLs that took the base_url setting into account, but so do Datasette plugins. So I built a new datasette.urls collection of helper methods and made them part of the documented internals API for plugins. The Building URLs within plugins documentation shows how these should be used.

I also added documentation on Running Datasette behind a proxy with example configs (tested on my laptop) for both nginx and Apache.

The datasette.client mechanism from Datasette 0.50 allows plugins to make calls to Datasette’s internal JSON API without the overhead of an HTTP request. This is another place where plugins need to be able to construct valid URLs to internal Datasette pages.

I added this example to the documentation showing how the two features can work together:

table_json = (
    await datasette.client.get(
        datasette.urls.table("fixtures", "facetable", format="json")
    )
).json()

One final weird detail on this: Datasette now has various methods that automatically add the base_url prefix to a URL. I got worried about what would happen if these were applied more than once (as above, where datasette.urls.table() applies the prefix so does datasette.client.get()).

I fixed this using the same trick that Django and Jinja use to avoid appliying auto-escaping twice to content that will be displayed in HTML: the datasette.urls methods actually return a PrefixedUrlString object which is a subclass of str that knows that the prefix has been applied! Code for that lives here.

Smaller changes

A few highlights from the “smaller changes” in Datasette 0.51:

  • Wide tables shown within Datasette now scroll horizontally (#998). This is achieved using a new <div class="table-wrapper"> element which may impact the implementation of some plugins (for example this change to datasette-cluster-map).

I think this is a big improvement: if your database table is too wide, it now scrolls horizontally on the page (rather than blowing the entire page out to a wider width). You can see that in action on the global-power-plants demo.

If you are signed in as root the new navigation menu links to a whole plethora of previously-undiscoverable Datasette debugging tools. This new permission controls the display of those items.

  • Link: HTTP header pagination. (#1014)

Inspired by GitHub and WordPress, which both use the HTTP Link header in this way. It’s an optional extra though: Datasette will always offer in-JSON pagination information.

  • Edit SQL button on canned queries, (#1019)

Suggested by Jacob Fenton in this issue. The implementation had quite a few edge cases since there are certain categories of canned query that can’t be executed as custom SQL by the user. See the issue comments for details and a demo.

  • --load-extension=spatialite shortcut. (#1028)

Inspired by a similar feature in sqlite-utils.

  • datasette -o option now opens the most relevant page. (#976)

This is a fun little feature. If your Datasette only loads a single database, and that database only has a single table (common if you’ve just run a single CSV import) then running this will open your browser directly to that table page:

datasette data.db -o
  • datasette --cors option now enables access to /database.db downloads. (#1057)

This was inspired by Mike Bostock’s Observable Notebook that uses the Emscripten-compiled JavaScript version of SQLite to run queries against SQLite database files.

It turned out you couldn’t use that notebook against SQLite files hosted in Datasette because they weren’t covered by Datasette’s CORS option. Now they are!

Recommendations for plugin authors, inspired by a question from David Kane on Twitter. David has been building datasette-reconcile, a Datasette plugin that offers a reconciliation API endpoint that can be used with OpenRefine. What a brilliant idea!

datasette-edit-templates (almost)

Inspired by a conversation with Jesse Vincent, I also spent some time experimenting with the idea of a plugin that can load and edit templates from the database—which would turn a personal Datasette into a really fun interface hacking environment. I nearly got this working, and even shipped a preview of a load_template() plugin hook in the Datasette 0.51a2 alpha... before crashing into a road block when I realized that it also needed to work with Jinja’s {% extends %} and {% include %} template tags and loaders for those don’t currenty support async functions.

In exploring this I also realized that my load_template() plugin hook wasn’t actually necessary—if I’m going to solve this problem with Jinja loaders I can do so using the existing prepare_jinja2_environment(env) hook.

My not-yet-functional prototype for this is caled datasette-edit-templates. I’m pretty confident I can get it working against the old plugin hook with a little more work.

Other weeknotes

Most of my time this week was spent on Datasette 0.51—but I did find a little bit of time for other projects.

I finished recording my talk for PyCon Argentina. It will air on November 20th.

sqlite-utils 2.23 is out, with a .m2m() bug fix from Adam Wolf and the new ability to display progress bars when importing TSV and CSV files.

Releases this week

Several of these are updates to take advantage of the new navigation plugin hooks introduced in Datasette 0.51.

TIL this week