Simon Willison’s Weblog

Weeknotes: Datasette 0.63.3, datasette-ripgrep

We’re back in the UK to see family over Christmas (our first trip back since 2019). Here are a few notes from the past couple of weeks.

Datasette 0.63.3

In addition to the Datasette 1.02a2 alpha (described in detail here) I also published a small bug fix release for the 0.63.x stable branch. Quoting the release notes in full:

  • Fixed a bug where datasette --root, when running in Docker, would only output the URL to sign in as root when the server shut down, not when it started up. (#1958)
  • You no longer need to ensure await datasette.invoke_startup() has been called in order for Datasette to start correctly serving requests—this is now handled automatically the first time the server receives a request. This fixes a bug experienced when Datasette is served directly by an ASGI application server such as Uvicorn or Gunicorn. It also fixes a bug with the datasette-gunicorn plugin. (#1955)

That second fix ended up taking longer than expected.

The root of that fix was that back in Datasette 0.63 I introduced the need to call await datasette.invoke_startup() as part of Datasette’s setup process—mainly to trigger plugins that might need to run their own async setup code.

This turned out to break a bunch of unexpected things—most notably, it affected any time people wanted to run Datasette using an ASGI handler such us Gunicorn or Uvicorn.

It broke my own datasette-gunicorn plugin too.

The core problem was that the Datasette() class constructor can be called synchronously, but needed a subsequent await ... call to run those async def setup methods.

I realized that a neater way to handle this would be to introduce a mechanism such that the first time anyone attempted to run an HTTP request through Datasette—an operation that always involved an await—the invoke_startup() method would be called automatically.

I got that working, but in doing so I ran into a longer-running set of problems.

Datasette has around 1,200 tests at this point, and parts of the test suite date back to the start of the project and no longer reflect my preferred way of writing tests.

I’ve started running into “too many open files” errors running the test suite on macOS, and have so far not quite tracked down the best way to keep open file handles under control.

Test failures were hampering my efforts to fix the issue, so I used this as the impetus to refactor a large chunk of the test suite.

Several hundred of Datasette’s tests now share a single in-memory fixtures database—previously, these tests were using a fixtures.db database file created in a temporary directory.

There’s still more test refactoring that I want to do, described in this issue, but I’m happy with the progress I’ve made so far.

datasette-ripgrep, cosmetic upgrade

I built datasette-ripgrep a couple of years ago—it’s a Datasette plugin that provides a UI for running ripgrep code search queries—and linking to the results. It’s very handy for finding uses of APIs that I might want to deprecate.

In using it to investigate Datasette’s error output I spotted that the results would be more readable if they included a gap between non-consecutive line numbers, so I shipped an update with that improvement.

Screenshot of some code search results - matching lines are highlighted in yellow and there are small gaps between non-consecutive groups of line numbers

Releases this week

TIL this week

This is Weeknotes: Datasette 0.63.3, datasette-ripgrep by Simon Willison, posted on 20th December 2022.

Next: 2022 in projects and blogging

Previous: Datasette 1.0a2: Upserts and finely grained permissions