Simon Willison’s Weblog

Subscribe

Weeknotes: Hacking on 23 different projects

16th April 2020

I wrote a lot of code this week: 184 commits over 23 repositories! I’ve also started falling for Zeit Now v2, having found workarounds for some of my biggest problems with it.

Better Datasette on Zeit Now v2

Last week I bemoaned the loss of Zeit Now v1 and documented my initial explorations of Zeit Now v2 with respect to Datasette.

My favourite thing about Now v1 was that it ran from Dockerfiles, which gave me complete control over the versions of everything in my deployment environment.

Now v2 runs on AWS Lambda, which means you are mostly stuck with what Zeit’s flavour of Lambda gives you. This currently means Python 3.6 (not too terrible—Datasette fully supports it) and a positively ancient SQLite— 3.7.17 from May 2013.

Lambda runs on Amazon Linux. Charles Leifer maintains a package called pysqlite3 which bundles the latest version of SQLite3 as a standalone Python package, and includes a pysqlite3-binary package precompiled for Linux. Could it work on Amazon Linux...?

It turns out it does! A one-line change (not including tests) to my datasette-publish-now and it now deploys Datasette on Now v2 with SQLite 3.31.1—the latest release from January this year, with window functions and all kinds of other goodness.

This means that Now v2 is back to being a really solid option for hosting Datasette instances. You get scale-to-zero, crazily low prices and really fast cold-boot times. It can only take databases up to around 50MB—if you need more space than that you’re better off with Cloud Run—but it’s a great option for smaller data.

I released a few versions of datasette-publish-now as a result of this research. I plan to release the first non-alpha version at the same time as Datasette 0.40.

Various projects ported to Now v2 or Cloud Run

I had over 100 projects running on Now v1 that needed updating or deleting in time for that platform’s shutdown in August. I’ve been porting some of them very quickly using datasette-publish-now, but a few have been more work. Some highlights from this week:

big-local-datasette

I’ve been collaborating with the Big Local team at Stanford on a number of projects related to the Covid-19 situation. It’s not quite open to the public yet but I’ve been building a Datasette instance which shares data from the “open projects” maintained by that team.

The implementation fits a common pattern for me: a scheduled GitHub Action which fetches project data from a GraphQL API, seeks out CSV files which have changed (using HTTP HEAD requests to check their ETags), loads the CSV into SQLite tables and publishes the resulting database using datasette publish cloudrun.

There’s one interesing new twist: I’m fetching the existing database files on every run using my new datasette-clone tool (written for this project), applying changes to them and then only publishing if the resulting MD5 sums have changed since last time.

It seems to work well, and I’m excited about this technique as a way of incrementally updating existing databases using stateless code running in a GitHub Action.

Datasette Cloud

I continue to work on the invite-only alpha of my SaaS Datasette platform, Datasette Cloud. This week I ported the CI and deployment scripts from GitLab to GitHub Actions, mainly to try and reduce the variety of CI systems I’m working with (I now have projects live on three: Travis, Circle CI and GitHub Actions).

I’ve also been figuring out ways of supporting API tokens for making requests to authentication-protected Datasette instances. I shipped small releases of datasette-auth-github and datasette-auth-existing-cookies to support this.

In tinkering with Datasette Cloud I also shipped an upgrade to datasette-mask-columns, which now shows visible REDACTED text on redacted columns in table view.

Miscellaneous