Weeknotes: Datasette 0.39 and many other projects
25th March 2020
This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.
This evening I shipped Datasette 0.39. The two big features are a mechanism for setting the default sort order for tables and a new
base_url configuration setting.
You can see the new default sort order in action on my Covid-19 project—the daily reports now default to sort by day descending so the most recent figures show up first. Here’s the metadata that makes it happen, and here’s the new documentation.
I had to do some extra work on that project this morning when the underlying data changed its CSV column headings without warning.
base_url feature has been an open issue since Janunary 2019. It lets you run Datasette behind a proxy on a different URL prefix—
/tools/datasette/ for example. The trigger for finally getting this solved was a Twitter conversation about running Datasette on Binder in coordination with a Jupyter notebook.
Tony Hirst did some work on this last year, but was stumped by the lack of a
base_url equivalent. Terry Jones shared an implementation in December. I finally found the inspiration to pull it all together, and ended up wih a working fork of Tony’s project which does indeed launch Datasette on Binder—try launching your own here.
As you might expect, it’s a tool for importing GitHub data into a SQLite database. Today it can handle repositories, releases, release assets, commits, issues and issue comments. You can see a live demo built from Dogsheep organization data at github-to-sqlite.dogsheep.net (deployed by this GitHub action).
I built this tool primarily to help me better keep track of all of my projects. Pulling the issues into a single database means I can run queries against all open issues across all of my repositories, and imporing commits and releases is handy for when I want to write my weeknotes and need to figure out what I’ve worked on lately.
GitHub issues use Markdown. To correctly display them it’s useful to be able to render that Markdown. I built datasette-render-markdown back in November, but this week I made some substantial upgrades: you can now configure which columns should be rendered, and it includes support for Markdown extensions including GitHub-Flavored Markdown.
You can see it in action on the github-to-sqlite demo.
I also upgraded datasette-render-timestamps with the same explicit column configuration pattern.
Fly is a relatively new hosting provider which lets you host applications bundled as Docker containers in load-balanced data centers geographically close to your users.
It has a couple of characteristics that make it a really good fit for Datasette.
Firstly, the pricing model: Fly will currently host a tiny (128MB of RAM) container for $2.67/month—and they give you $10/month of free service credit, enough for 3 containers.
It turns out Datasette runs just fine in 128MB of RAM, so that’s three always-on Datasette containers! (Unlike Heroku and Cloud Run, Fly keeps your containers running rather than scaling them to zero).
Secondly, it works by shipping it a Dockerfile. This means building datasette publish support for it is really easy.
I added the publish_subcommand plugin hook to Datasette all the way back in 0.25 in September 2018, but I’ve never actually built anything with it. That’s now changed: datasette-publish-fly uses the hook to add a
datasette publish fly command for publishing databases directly to your Fly account.
It turns out I created my Hacker News account in 2007, and I’ve posted 2,167 comments and submitted 131 stories since then. Since my personal Dogsheep project is about pulling my data from multiple sources into a single place it made sense to build a tool for importing from Hacker News.
hacker-news-to-sqlite uses the official Hacker News API to import every comment and story posted by a specific user. It can also use one or more item IDs to suck the entire discussion tree around those items.
The README includes detailed documentation on how to best browse your data using Datasette once you have imported it.
- sqlite-utils gained some improvements to the way it suggests types for existing columns.
- twitter-to-sqlite now offers
--attachfor more of its subcommands.
- datasette-show-errors is a new plugin which exposes 500 errors as tracebacks, like Django does with
DEBUG=True. It’s built on top of Starlette’s ServerErrorMiddleware.
- I upgraded inaturalist-to-sqlite to work with
More recent articles
- Weeknotes: the Datasette Cloud API, a podcast appearance and more - 1st October 2023
- Things I've learned about building CLI tools in Python - 30th September 2023
- Talking Large Language Models with Rooftop Ruby - 29th September 2023
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023