Simon Willison’s Weblog

Subscribe

Weeknotes: Datasette column actions, plus three new plugins

2nd October 2020

A renewed emphasis on building out Datasette Cloud has produced three new plugins this week: datasette-dateutil, datasette-import-table and datasette-edit-schema, plus a major improvement to Datasette’s default interface for browsing tables.

Datasette column actions

If you take a look at the latest.datasette.io demo (continuously deployed with the latest Datasette main branch) you’ll see a new UI element: cog icons on each table column, which reveal a drop-down menu.

The on_earth column has an expanded menu with Sort ascending, Sort descending and Facet by this menu items.

I’m calling these “column actions”, and they’re an improvement I’ve been planning for a long time.

So far these solve two problems:

  • Sorting a column in descending order previously required two full page load clicks—one to sort ascending, and another to reverse that order. You can now sort descending directly from the menu.
  • Datasette lets you facet by any column by appending ?_facet=column-name to the URL, but this is not a very discoverable feature! It suggests facets for columns that have less than 30 unique values, but sometimes you want to facet by something else—US states for example. You can now facet any column using the cog menu, with the exception of single unique primary key columns since faceting by those is guaranteed to only return one result per value.

Additionally, if a column has visible blank values on the page a “Show not-blank rows” option will be displayed, for quickly seeing rows that have something interesting in that column.

I’m planning to make these available for extension by plugins—I’m sure there are all sorts of other potentially useful per-column actions.

One concern I had in building these was accessibility: I was particularly worried that adding the markup for a dropdown menu to a <th> element would interfere with screenreaders when they were using those headers to describe the contents of a cell.

In the end I solved this by creating the menu as a separate absolutely positioned element, which appears in the correct place when the cog icon is clicked.

This still needs more work—I need to learn how to use ARIA to make the menus fully accesible—but I’m optimistic in the approach so far. If you use a screen reader I would deeply value your feedback in the column action accessibility issue.

If you want to try out column actions they’re available in the latest alpha release, Datasette 0.50a0.

datasette-dateutil

I’m working with students in the Stanford Big Local Journalism course this term, and it’s really helping me figure out the gaps in Datasette and my forthcoming Datasette Cloud hosted offering.

I want Datasette to be as useful to journalists as possible. This means handling messy data from all manner of sources (many of them government).

One missing feature has been flexible date handling. I’ve been succesfully using the excellent python-dateutil library for this for fifteen years now, so I decided to expose its functionality within Datasette as the datasette-dateutil plugin.

The plugin adds several new custom SQL functions. Here’s an example query:

select
  dateutil_parse("10 october 2020 3pm"),
  dateutil_parse_fuzzy("This is due 10 september"),
  dateutil_parse("1/2/2020"),
  dateutil_parse("2020-03-04"),
  dateutil_parse_dayfirst("2020-03-04"),
  dateutil_easter(2020);

Try that query here. It outputs the following:

2020-10-10T15:00:00
2020-09-10T00:00:00
2020-01-02T00:00:00
2020-03-04T00:00:00
2020-04-03T00:00:00
2020-04-12

I’ve also exposed the rrule module, which implements recurring date rules as described in the iCalendar specification. The dateutil_rrule() and dateutil_rrule_date() functions return JSON arrays of dates matching the rule, for example:

select
  dateutil_rrule('FREQ=HOURLY;COUNT=5'),
  dateutil_rrule_date(
    'FREQ=DAILY;COUNT=3',
    '1st jan 2020'
  );

Try it—here are the results:

[
    "2020-10-02T23:12:55",
    "2020-10-03T00:12:55",
    "2020-10-03T01:12:55",
    "2020-10-03T02:12:55",
    "2020-10-03T03:12:55"
]

[
    "2020-01-01",
    "2020-01-02",
    "2020-01-03"
]

These are SQLite strings containing JSON arrays—but you can use SQLite’s json_each() function to expand them out into tables that can then be joined against other tables! The README goes into more details on how you can do that.

This is particularly useful for solving problems like generating the number-of-items per day for a given time period while still returning a 0 for dates that do not have any corresponding items.

datasette-edit-schema

I previewed this last week, when it was called datasette-edit-table. I realized that datasette-edit-schema is a better name as it doesn’t imply that you can edit the rows within the table—just the schema of the table itself.

Release 0.3 is the first non-alpha release, and provides the following capabilities:

  • Rename a table
  • Add new columns to a table
  • Rename columns in a table
  • Modify the type of columns in a table
  • Re-order the columns in a table
  • Delete a table

This is particularly useful should you import a messy CSV file (using datasette-upload-csvs) and then need to rename some columns, change their types (so that you can sort by integers and floats instead of everything being text), then re-order them to make more sense.

It also means you can rename LAT and LON columns to latitude and longitude so that datasette-cluster-map knows that it can render them on a map.

datasette-import-table

If you’re going to modify and clean data, it can be helpful to start with an existing table. datasette-import-table lets you provide the URL to an existing Datasette table hosted anywhere on the internet—it will then create a local copy of that table by paginating through the JSON API for that table and importing all of the rows.

TIL this week

Releases this week