Weeknotes: git-history, bug magnets and s3-credentials --public
8th December 2021
I’ve stopped considering my projects “shipped” until I’ve written a proper blog entry about them, so yesterday I finally shipped git-history, coinciding with the release of version 0.6—a full 27 days after the first 0.1.
It took way more work than I was expecting to get to this point!
I wrote the first version of
git-history in an afternoon, as a tool for a workshop I was presenting on Git scraping and Datasette.
Before promoting it more widely, I wanted to make some improvements to the schema. In particular, I wanted to record only the updated values in the
item_version table—which otherwise could end up duplicating a full copy of each item in the database hundreds or even thousands of times.
Getting this right took a lot of work, and I kept on getting stumped by weird bugs and edge-cases. This bug in particular added a couple of days to the project.
The whole project turned out to be something of a bug magnet, partly because of a design decision I made concerning column names.
git-history creates tables with columns that correspond to the underlying data. Since it also needs its own columns for tracking things like commits and incremental versions, I decided to use underscore prefixes for reserved columns such as
Datasette uses underscore prefixes for its own purposes—special table arguments such as
?_facet=column-name. It’s supposed to work with existing columns that use underscores by converting query string arguments like
git-history was the first of my projects to really exercise this, and I kept on finding bugs. Datasette 0.59.2 and 0.59.4 both have related bug fixes, and there’s a re-opened bug that I have yet to resolve.
git-history live demos are built and deployed by this GitHub Actions workflow. The workflow works by checking out three separate repos and running
git-history against them. It takes advantage of that tool’s ability to add just new commits to an existing database to run faster, so it needs to persist database files in between runs.
Since these files can be several hundred MBs, I decided to persist them in an S3 bucket.
My s3-credentials tool provides the ability to create a new S3 bucket along with restricted read-write credentials just for that bucket, ideal for use in a GitHub Actions workflow.
I decided to make the bucket public such that anyone can download files from it, since there was no reason to keep it private. I’ve been wanting to add this ability to
s3-credentials for a while now, so this was the impetus I needed to finally ship that feature.
It’s surprisingly hard to figure out how to make an S3 bucket public these days! It turned out the magic recipe was adding a JSON bucket policy document to the bucket granting
s3:GetObject permission to principal
*—here’s that policy in full.
I released s3-credentials 0.8 with a new
--public option for creating public buckets—here are the release notes in full:
s3-credentials create my-bucket --publicoption for creating public buckets, which allow anyone with knowledge of a filename to download that file. This works by attaching this public bucket policy to the bucket after it is created. #42
s3-credentials put-objectnow sets the
Content-Typeheader on the uploaded object. The type is detected based on the filename, or can be specified using the new
s3-credentials policy my-bucket --public-bucketoutputs the public bucket policy that would be attached to a bucket of that name. #44
I wrote up this TIL which doubles as a mini-tutorial on using
s3-credentials: Storing files in an S3 bucket between GitHub Actions runs.
This was a quick experiment which turned into a prototype Datasette plugin. I really like how GitHub show hover card previews of links to issues in their interface:
I decided to see if I could build something similar for links within Datasette, specifically the links that show up when a column is a foreign key to another record.
Here’s what I’ve got so far:
There’s an interactive demo running on this table page.
It still needs a bunch of work—in particular I need to think harder about when the card is shown, where it displays relative to the mouse pointer, what causes it to be hidden again and how it should handle different page widths. Ideally I’d like to figure out a useful mobile / touch-screen variant, but I’m not sure how that could work.
The prototype plugin is called datasette-hovercards—I’d like to eventually merge this back into Datasette core once I’m happy with how it works.
Releases this week
git-history: 0.6.1—(9 releases total)—2021-12-08
Tools for analyzing Git history using SQLite
datasette-cluster-map: 0.17.2—(20 releases total)—2021-12-07
Datasette plugin that shows a map for any data with latitude/longitude columns
s3-credentials: 0.8—(8 releases total)—2021-12-07
A tool for creating credentials for accessing S3 buckets
asyncinject: 0.2a1—(3 releases total)—2021-12-03
Run async workflows using pytest-fixtures-style dependency injection
Add preview hovercards to links in Datasette
github-to-sqlite: 2.8.3—(22 releases total)—2021-12-01
Save data from GitHub to a SQLite database
TIL this week
More recent articles
- Things I've learned about building CLI tools in Python - 30th September 2023
- Talking Large Language Models with Rooftop Ruby - 29th September 2023
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023
- Weeknotes: Plugins for LLM, sqlite-utils and Datasette - 5th August 2023