Weeknotes: Datasette 0.39 and many other projects

25th March 2020

This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.

Datasette 0.39

This evening I shipped Datasette 0.39. The two big features are a mechanism for setting the default sort order for tables and a new base_url configuration setting.

You can see the new default sort order in action on my Covid-19 project—the daily reports now default to sort by day descending so the most recent figures show up first. Here’s the metadata that makes it happen, and here’s the new documentation.

I had to do some extra work on that project this morning when the underlying data changed its CSV column headings without warning.

The base_url feature has been an open issue since Janunary 2019. It lets you run Datasette behind a proxy on a different URL prefix—/tools/datasette/ for example. The trigger for finally getting this solved was a Twitter conversation about running Datasette on Binder in coordination with a Jupyter notebook.

Tony Hirst did some work on this last year, but was stumped by the lack of a base_url equivalent. Terry Jones shared an implementation in December. I finally found the inspiration to pull it all together, and ended up wih a working fork of Tony’s project which does indeed launch Datasette on Binder—try launching your own here.

github-to-sqlite

I’ve not done much work on my Dogsheep family of tools in a while. That changed this week: in particular, I shipped a 1.0 of github-to-sqlite.

As you might expect, it’s a tool for importing GitHub data into a SQLite database. Today it can handle repositories, releases, release assets, commits, issues and issue comments. You can see a live demo built from Dogsheep organization data at github-to-sqlite.dogsheep.net (deployed by this GitHub action).

I built this tool primarily to help me better keep track of all of my projects. Pulling the issues into a single database means I can run queries against all open issues across all of my repositories, and imporing commits and releases is handy for when I want to write my weeknotes and need to figure out what I’ve worked on lately.

datasette-render-markdown

GitHub issues use Markdown. To correctly display them it’s useful to be able to render that Markdown. I built datasette-render-markdown back in November, but this week I made some substantial upgrades: you can now configure which columns should be rendered, and it includes support for Markdown extensions including GitHub-Flavored Markdown.

You can see it in action on the github-to-sqlite demo.

I also upgraded datasette-render-timestamps with the same explicit column configuration pattern.

datasette-publish-fly

Fly is a relatively new hosting provider which lets you host applications bundled as Docker containers in load-balanced data centers geographically close to your users.

It has a couple of characteristics that make it a really good fit for Datasette.

Firstly, the pricing model: Fly will currently host a tiny (128MB of RAM) container for $2.67/month—and they give you $10/month of free service credit, enough for 3 containers.

It turns out Datasette runs just fine in 128MB of RAM, so that’s three always-on Datasette containers! (Unlike Heroku and Cloud Run, Fly keeps your containers running rather than scaling them to zero).

Secondly, it works by shipping it a Dockerfile. This means building datasette publish support for it is really easy.

I added the publish_subcommand plugin hook to Datasette all the way back in 0.25 in September 2018, but I’ve never actually built anything with it. That’s now changed: datasette-publish-fly uses the hook to add a datasette publish fly command for publishing databases directly to your Fly account.

hacker-news-to-sqlite

It turns out I created my Hacker News account in 2007, and I’ve posted 2,167 comments and submitted 131 stories since then. Since my personal Dogsheep project is about pulling my data from multiple sources into a single place it made sense to build a tool for importing from Hacker News.

hacker-news-to-sqlite uses the official Hacker News API to import every comment and story posted by a specific user. It can also use one or more item IDs to suck the entire discussion tree around those items.

The README includes detailed documentation on how to best browse your data using Datasette once you have imported it.

Other projects

sqlite-utils gained some improvements to the way it suggests types for existing columns.
twitter-to-sqlite now offers --sql and --attach for more of its subcommands.
datasette-show-errors is a new plugin which exposes 500 errors as tracebacks, like Django does with DEBUG=True. It’s built on top of Starlette’s ServerErrorMiddleware.
I upgraded inaturalist-to-sqlite to work with sqlite-utils 2.x.

Posted 25th March 2020 at 5:33 am · Follow me on Mastodon or Twitter or subscribe to my newsletter

Simon Willison’s Weblog