Weeknotes: git-history, bug magnets and s3-credentials --public

8th December 2021

I’ve stopped considering my projects “shipped” until I’ve written a proper blog entry about them, so yesterday I finally shipped git-history, coinciding with the release of version 0.6—a full 27 days after the first 0.1.

It took way more work than I was expecting to get to this point!

I wrote the first version of git-history in an afternoon, as a tool for a workshop I was presenting on Git scraping and Datasette.

Before promoting it more widely, I wanted to make some improvements to the schema. In particular, I wanted to record only the updated values in the item_version table—which otherwise could end up duplicating a full copy of each item in the database hundreds or even thousands of times.

Getting this right took a lot of work, and I kept on getting stumped by weird bugs and edge-cases. This bug in particular added a couple of days to the project.

The whole project turned out to be something of a bug magnet, partly because of a design decision I made concerning column names.

git-history creates tables with columns that correspond to the underlying data. Since it also needs its own columns for tracking things like commits and incremental versions, I decided to use underscore prefixes for reserved columns such as _item and _version

Datasette uses underscore prefixes for its own purposes—special table arguments such as ?_facet=column-name. It’s supposed to work with existing columns that use underscores by converting query string arguments like ?_item=3 into ?_item__exact=3—but git-history was the first of my projects to really exercise this, and I kept on finding bugs. Datasette 0.59.2 and 0.59.4 both have related bug fixes, and there’s a re-opened bug that I have yet to resolve.

Building the ca-fires demo also revealed a bug in datasette-cluster-map which I fixed in version 0.17.2.

s3-credentials --public

The git-history live demos are built and deployed by this GitHub Actions workflow. The workflow works by checking out three separate repos and running git-history against them. It takes advantage of that tool’s ability to add just new commits to an existing database to run faster, so it needs to persist database files in between runs.

Since these files can be several hundred MBs, I decided to persist them in an S3 bucket.

My s3-credentials tool provides the ability to create a new S3 bucket along with restricted read-write credentials just for that bucket, ideal for use in a GitHub Actions workflow.

I decided to make the bucket public such that anyone can download files from it, since there was no reason to keep it private. I’ve been wanting to add this ability to s3-credentials for a while now, so this was the impetus I needed to finally ship that feature.

It’s surprisingly hard to figure out how to make an S3 bucket public these days! It turned out the magic recipe was adding a JSON bucket policy document to the bucket granting s3:GetObject permission to principal *—here’s that policy in full.

I released s3-credentials 0.8 with a new --public option for creating public buckets—here are the release notes in full:

s3-credentials create my-bucket --public option for creating public buckets, which allow anyone with knowledge of a filename to download that file. This works by attaching this public bucket policy to the bucket after it is created. #42

s3-credentials put-object now sets the Content-Type header on the uploaded object. The type is detected based on the filename, or can be specified using the new --content-type option. #43

s3-credentials policy my-bucket --public-bucket outputs the public bucket policy that would be attached to a bucket of that name. #44

I wrote up this TIL which doubles as a mini-tutorial on using s3-credentials: Storing files in an S3 bucket between GitHub Actions runs.

datasette-hovercards

This was a quick experiment which turned into a prototype Datasette plugin. I really like how GitHub show hover card previews of links to issues in their interface:

Animation showing a hover card displayed when the mouse cursor touches a link to a GitHub Issue

I decided to see if I could build something similar for links within Datasette, specifically the links that show up when a column is a foreign key to another record.

Here’s what I’ve got so far:

Animation showing a hover card displayed in GitHub for a link to another record

There’s an interactive demo running on this table page.

It still needs a bunch of work—in particular I need to think harder about when the card is shown, where it displays relative to the mouse pointer, what causes it to be hidden again and how it should handle different page widths. Ideally I’d like to figure out a useful mobile / touch-screen variant, but I’m not sure how that could work.

The prototype plugin is called datasette-hovercards—I’d like to eventually merge this back into Datasette core once I’m happy with how it works.

Releases this week

git-history: 0.6.1—(9 releases total)—2021-12-08
Tools for analyzing Git history using SQLite
datasette-cluster-map: 0.17.2—(20 releases total)—2021-12-07
Datasette plugin that shows a map for any data with latitude/longitude columns
s3-credentials: 0.8—(8 releases total)—2021-12-07
A tool for creating credentials for accessing S3 buckets
asyncinject: 0.2a1—(3 releases total)—2021-12-03
Run async workflows using pytest-fixtures-style dependency injection
datasette-hovercards: 0.1a0—2021-12-02
Add preview hovercards to links in Datasette
github-to-sqlite: 2.8.3—(22 releases total)—2021-12-01
Save data from GitHub to a SQLite database

TIL this week

Posted 8th December 2021 at 9:34 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog