Simon Willison’s Weblog

datasette-search-all: a new plugin for searching multiple Datasette tables at once

I just released a new plugin for Datasette, and it’s pretty fun. datasette-search-all is a plugin written mostly in JavaScript that executes the same search query against every searchable table in every database connected to your Datasette instance.

You can try it out on my FARA (Foreign Agents Registration Act) search site, fara.datasettes.com—see Deploying a data API using GitHub Actions and Cloud Run for background on that project.

Here’s a search for manafort across all four FARA tables (derived from CSVs originally pulled from the Justice Department bulk data site).

Running a search for manafort

I’ve been planning to build cross-table search for Datasette for quite a while now. It’s a potentially very difficult problem: searching a single table is easy, but the moment you attempt to search multiple tables you run into a number of challenges:

  • Different tables have different columns. How do you present those in a single search interface?
  • Full-text search relevance scores make sense within a single table (due to the statistics they rely on, see Exploring search relevance algorithms with SQLite) but cannot be compared across multiple tables.

I have an idea for how I can address these, but it involves creating a single separate full-text index table that incorporates text from many different tables, along with a complex set of indexing mechanisms (maybe driven by triggers) for keeping it up to date.

But maybe I was overthinking this?

While I stewed on the ideal way to solve this problem, projects like my FARA site were stuck without cross-table search. Then this morning I realized that there was another way: I could build pretty much the simplest thing that could possibly work (always a good plan in my experience).

Here’s how the new plugin works: it scans through every table attached to Datasette looking for tables that are configured for full-text search. Then it presents a UI which can excute searches against ALL of those tables, and present the top five results from each one.

The scanning-for-searchable-tables happens in Python, but the actual searching is all in client-side JavaScript. The searches run in parallel, which means the user sees results from the fastest (smallest) tables first, then the larger, slower tables drop in at the bottom.

It’s stupidly simple, but I really like the result. It’s also a neat demonstration of running parallel SQL queries from JavaScript, a technique which I’m keen to apply to all sorts of other interesting problems.

JavaScript style

The JavaScript I wrote for this project is unconventional for 2020: it’s a block of inline script on the page, using no libraries or frameworks, but taking advantage of modern niceties like backtick template literals and fetch().

The code is messy, short and extremely easy to change in the future. It doesn’t require running a build tool. I’m pretty happy with it.

Adding a search form to the homepage

The other thing the plugin does is add a search box to the Datasette homepage (as seen on the FARA site)—but only if the attached databases contain at least one FTS-configured searchable table.

There are two parts to the implementation here. The first is a extra_template_vars() plugin hook which injects a searchable_tables variable into the hompage context—code here.

The second is a custom index.html template which ships with the plugin. When Datasette renders a template it looks first in the local --template-dirs folder (if that option was used), secondly in all of the installed plugins and finally in the Datasette set of default templates.

The new index.html template starts with {% extends "default:index.html" %}, which means it extends the default template that shipped with Datasette. It then redefines the description_source_license block from that template to conditionally show the search form.

I’m not at all happy with abusing description_source_license in this way—it just happens to be a block located at the top of that page. As I write more plugins that customize the Datasette UI in some way I continually run into this problem: plugins need to add markup to pages in specific points, but they also need to do so in a way that won’t over-ride what other plugins are up to.

I’m beginning to formulate an idea for how Datasette can better support this, but until that’s ready I’ll be stuck with hacks like the one used here.

Using this with datasette-configure-fts and datasette-upload-csvs

The datasette-configure-fts plugin provides a simple UI for configuring search for different tables, by selecting which columns should be searchable.

Combining this with datasette-search-all is really powerful. It means you can dump a bunch of CSVs into Datasette (maybe using datasette-upload-csvs), select some columns and then run searches across all of those different data sources in one place.

Not bad for 93 lines of JavaScript and a bit of Python glue!

This is datasette-search-all: a new plugin for searching multiple Datasette tables at once by Simon Willison, posted on 9th March 2020.

Tagged , , , , ,

Next: Weeknotes: COVID-19 numbers in Datasette

Previous: Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf