Weeknotes: git-history, bug magnets and s3-credentials --public
I’ve stopped considering my projects “shipped” until I’ve written a proper blog entry about them, so yesterday I finally shipped git-history, coinciding with the release of version 0.6—a full 27 days after the first 0.1.
It took way more work than I was expecting to get to this point!
I wrote the first version of
git-history in an afternoon, as a tool for a workshop I was presenting on Git scraping and Datasette.
Before promoting it more widely, I wanted to make some improvements to the schema. In particular, I wanted to record only the updated values in the
item_version table—which otherwise could end up duplicating a full copy of each item in the database hundreds or even thousands of times.
Getting this right took a lot of work, and I kept on getting stumped by weird bugs and edge-cases. This bug in particular added a couple of days to the project.
The whole project turned out to be something of a bug magnet, partly because of a design decision I made concerning column names.
git-history creates tables with columns that correspond to the underlying data. Since it also needs its own columns for tracking things like commits and incremental versions, I decided to use underscore prefixes for reserved columns such as
Datasette uses underscore prefixes for its own purposes—special table arguments such as
?_facet=column-name. It’s supposed to work with existing columns that use underscores by converting query string arguments like
git-history was the first of my projects to really exercise this, and I kept on finding bugs. Datasette 0.59.2 and 0.59.4 both have related bug fixes, and there’s a re-opened bug that I have yet to resolve.
Building the ca-fires demo also revealed a bug in datasette-cluster-map which I fixed in version 0.17.2.
git-history live demos are built and deployed by this GitHub Actions workflow. The workflow works by checking out three separate repos and running
git-history against them. It takes advantage of that tool’s ability to add just new commits to an existing database to run faster, so it needs to persist database files in between runs.
Since these files can be several hundred MBs, I decided to persist them in an S3 bucket.
My s3-credentials tool provides the ability to create a new S3 bucket along with restricted read-write credentials just for that bucket, ideal for use in a GitHub Actions workflow.
I decided to make the bucket public such that anyone can download files from it, since there was no reason to keep it private. I’ve been wanting to add this ability to
s3-credentials for a while now, so this was the impetus I needed to finally ship that feature.
It’s surprisingly hard to figure out how to make an S3 bucket public these days! It turned out the magic recipe was adding a JSON bucket policy document to the bucket granting
s3:GetObject permission to principal
*—here’s that policy in full.
I released s3-credentials 0.8 with a new
--public option for creating public buckets—here are the release notes in full:
s3-credentials create my-bucket --publicoption for creating public buckets, which allow anyone with knowledge of a filename to download that file. This works by attaching this public bucket policy to the bucket after it is created. #42
s3-credentials put-objectnow sets the
Content-Typeheader on the uploaded object. The type is detected based on the filename, or can be specified using the new
s3-credentials policy my-bucket --public-bucketoutputs the public bucket policy that would be attached to a bucket of that name. #44
I wrote up this TIL which doubles as a mini-tutorial on using
s3-credentials: Storing files in an S3 bucket between GitHub Actions runs.
This was a quick experiment which turned into a prototype Datasette plugin. I really like how GitHub show hover card previews of links to issues in their interface:
I decided to see if I could build something similar for links within Datasette, specifically the links that show up when a column is a foreign key to another record.
Here’s what I’ve got so far:
There’s an interactive demo running on this table page.
It still needs a bunch of work—in particular I need to think harder about when the card is shown, where it displays relative to the mouse pointer, what causes it to be hidden again and how it should handle different page widths. Ideally I’d like to figure out a useful mobile / touch-screen variant, but I’m not sure how that could work.
The prototype plugin is called datasette-hovercards—I’d like to eventually merge this back into Datasette core once I’m happy with how it works.
Releases this week
git-history: 0.6.1—(9 releases total)—2021-12-08
Tools for analyzing Git history using SQLite
datasette-cluster-map: 0.17.2—(20 releases total)—2021-12-07
Datasette plugin that shows a map for any data with latitude/longitude columns
s3-credentials: 0.8—(8 releases total)—2021-12-07
A tool for creating credentials for accessing S3 buckets
asyncinject: 0.2a1—(3 releases total)—2021-12-03
Run async workflows using pytest-fixtures-style dependency injection
Add preview hovercards to links in Datasette
github-to-sqlite: 2.8.3—(22 releases total)—2021-12-01
Save data from GitHub to a SQLite database
TIL this week
More recent articles
- AI-enhanced development makes me more ambitious with my projects - 27th March 2023
- I built a ChatGPT plugin to answer questions about data hosted in Datasette - 24th March 2023
- Weeknotes: AI won't slow down, a new newsletter and a huge Datasette refactor - 22nd March 2023
- Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - 22nd March 2023
- A conversation about prompt engineering with CBC Day 6 - 18th March 2023
- Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - 17th March 2023