Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history
28th March 2021
This week I got SpatiaLite 5 working in the Datasette Docker image, improved the CDC vaccination history git scraper, figured out Datasette on Azure and we closed on a new home!
SpatiaLite 5 for Datasette
SpatiaLite 5 came out earlier this year with a bunch of exciting improvements, most notably an implementation of KNN (K-nearest neighbours)—a way to efficiently answer the question “what are the 10 closest rows to this latitude/longitude point”.
I love building X near me websites so I expect I’ll be using this a lot in the future.
I spent a bunch of time this week figuring out how best to install it into a Docker container for use with Datasette. I finally cracked it in issue 1249 and the Dockerfile in the Datasette repository now builds with the SpatiaLite 5.0 extension, using a pattern I figured out for installing Debian unstable packages into a Debian stable base container.
When Datasette 0.56 is released the official Datasette Docker image will bundle SpatiaLite 5.0.
CDC vaccination history in Datasette
I’m tracking the CDC’s per-state vaccination numbers in my cdc-vaccination-history repository, as described in my Git scraping lightning talk.
Scraping data into a git repository to track changes to it over time is easy. What’s harder is extracting that data back out of the commit history in order to analyze and visualize it later.
To demonstrate how this can work I added a build_database.py script to that repository which iterates through the git history and uses it to build a SQLite database containing daily state reports. I also added steps to the GitHub Actions workflow to publish that SQLite database using Datasette and Vercel.
I installed the datasette-vega visualization plugin there too. Here’s a chart showing the number of doses administered over time in California.
This morning I started capturing the CDC’s per-county data too, but I’ve not yet written code to load that into Datasette. [UPDATE: that table is now available: cdc/daily_reports_counties]
Datasette on Azure
I’m keen to make Datasette easy to deploy in as many places as possible. I already have mechanisms for publishing to Heroku, Cloud Run, Vercel and Fly.io—today I worked out the recipe needed for Azure Functions.
I haven’t bundled it into a datasette-publish-azure
plugin yet but that’s the next step. In the meantime the azure-functions-datasette repo has a working example with instructions on how to deploy it.
Thanks go to Anthony Shaw for building out the ASGI wrapper needed to run ASGI applications like Datasette on Azure Functions.
iam-to-sqlite
I spend way too much time whinging about IAM on Twitter. I’m certain that properly learning IAM will unlock the entire world of AWS, but I have so far been unable to overcome my discomfort with it long enough to actually figure it out.
After yet another unproductive whinge this week I guilted myself into putting in some effort, and it’s already started to pay off: I figured out how to dump out all existing IAM data (users, groups, roles and policies) as JSON using the aws iam get-account-authorization-details
command, and got so excited about it that I built iam-to-sqlite as a wrapper around that command that writes the results into SQLite so I can browse them using Datasette!
I’m increasingly realizing that the key to me understanding how pretty much any service works is to pull their JSON into a SQLite database so I can explore it as relational tables.
A useful trick for writing weeknotes
When writing weeknotes like these, it’s really useful to be able to see all of the commits from the past week across many different projects.
Today I realized you can use GitHub search for this. Run a search for author:simonw created:>2021-03-20
and filter to commits, ordered by “Recently committed”.
Django pull request accepted!
I had a pull request accepted to Django this week! It was a documentation fix for the RawSQL query expression—I found a pattern for using it as part of an .filter(id__in=RawSQL(...))
query that wasn’t covered by the documentation.
And we found a new home
One other project this week: Natalie and I closed on a new home! We’re moving to El Granada, a tiny town just north of Half Moon Bay, on the coast 40 minutes south of San Francisco. We’ll be ten minutes from the ocean, with plenty of pinnipeds and pelicans. Exciting!
TIL this week
- Running gdb against a Python process in a running Docker container
- Tracing every executed Python statement
- Installing packages from Debian unstable in a Docker image based on stable
- Closest locations to a point
- Redirecting all paths on a Vercel instance
- Writing an Azure Function that serves all traffic to a subdomain
Releases this week
-
datasette-publish-vercel: 0.9.3—(15 releases total)—2021-03-26
Datasette plugin for publishing data using Vercel -
sqlite-transform: 0.5—(6 releases total)—2021-03-24
Tool for running transformations on columns in a SQLite database -
django-sql-dashboard: 0.5a0—(12 releases total)—2021-03-24
Django app for building dashboards using raw SQL queries -
iam-to-sqlite: 0.1—2021-03-24
Load Amazon IAM data into a SQLite database -
tableau-to-sqlite: 0.2.1—(4 releases total)—2021-03-22
Fetch data from Tableau into a SQLite database -
c64: 0.1a0—2021-03-21
Experimental package of ASGI utilities extracted from Datasette
More recent articles
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024