Simon Willison’s Weblog

Subscribe

Weeknotes: Datasette 0.40, various projects, Dogsheep photos

22nd April 2020

A new release of Datasette, two new projects and progress towards a Dogsheep photos solution.

Datasette 0.40

I released Datasette 0.40 last night. Full release notes are here, but the highlights of this key feature in this release is the ability to provide metadata in a metadata.yaml file as an alternative to metadata.json. This is particularly useful for embedded multi-line SQL queries: I’ve upgraded simonw/museums and simonw/til to take advantage of this, since they both use their metadata to define SQL queries that power their search pages and Atom feeds.

A JSK fellows directory and twitter-to-sqlite 0.21

My JSK Fellowship at Stanford ends in a few months. JSK has extremely talented and influential alumni, and one of the benefits of the fellowship is becoming part of that network afterwards.

The @JSKStanford Twitter account maintains lists of fellows on Twitter—journalists love Twitter!—so I decided to use my twitter-to-sqlite tool to build a Datasette-powered search engine of them.

That search engine is now running at jsk-fellows.datasettes.com. It’s updated daily by a GitHub Action to capture any bio changes or new list entrants.

It’s a neat example of taking advantage of SQLite views to build faceted search across a subset of data. A script constructs the jsk_fellows view at build time, then metadata.json configures that view to run full-text search and facet by the derived fellowship column.

I shipped twitter-to-sqlite 0.21 with a new twitter-to-sqlite lists username command as part of this project.

TILs and datasette-template-sql 1.0

I described my new TILs project on Monday. I’ve published 15 so far—the format is working really well for me.

Hacking on simonw/tils reminded me of a feature gap in my datasette-template-sql plugin: it didn’t have a solution for safely escaping parameters in SQL queries, leading to nasty string concatenated SQL queries.

datasette-template-sql 1.0 fixes that issue, at the cost of backwards compatibility with previous releases. I’m using it for both til and museums now.

github-to-sqlite 2.0

I released github-to-sqlite 2.0 with a small backwards incompatible change to the database schema (hence the major version increment). It builds on 1.1 from a few days ago which added a new github-to-sqlite contributors command for fetching statistics on contributors to repositories.

More importantly, I improved the live demo running at github-to-sqlite.dogsheep.net.

The demo now updates once a day using GitHub Actions and pulls in releases, commits, issues, issue comments and contributors for all of my Dogsheep projects plus datasette and sqlite-utils.

This means I can browse and execute SQL queries across 929 issues, 1,505 commits and 132 releases across 14 repositories!

Want to see which of my projects have had the most releases? Facet releases by repo.

I’ve also installed the datasette-search-all plugin there, so you can search across all commits, releases, issues etc for “zeit now” for example.

Bringing all of my different project data together in one place like this is really powerful.

I think it’s a great illustration of the Datasette/Dogsheep philosophy of pulling down a complete SQLite-powered copy of data from external services so you can query and join across your data without being limited to the functionality that those services provide through their own interfaces or APIs.

photos-to-sqlite alpha

Dogsheep is about bringing all of my interesting personal and social data into a single, private place.

The biggest thing missing at the moment is photos. I want to be able to query my photos with SQL, and eventually combine them with tweets, checkins etc in a unified timeline.

Last week I took a step towards this goal with heic-to-jpeg, a proxy to let me display my iPhone’s HEIC photos online.

This week I started work on photos-to-sqlite—the set of tools which I’ll use to turn my photos into something I can run queries again.

So far I’ve mainly been figuring out how to get them into an S3 bucket that I control. Once configured, running photos-to-sqlite upload photos.db ~/Pictures/Photos\ Library.photoslibrary/originals will start uploading every photo it can find in that directory to the S3 bucket.

The filename it uses is the sha256 hash of the photo file contents, which I’m hoping will let me de-dupe photos from multiple sources in the future. It also writes basic metadata on the photos to that photos.db SQLite database.

This is going to be a big project. I’m investigating osxphotos to liberate the metadata from Apple Photos, and various Python libraries for extracting EXIF data from the files themselves.

Once I’ve got that working, I can experiment with things like piping photos through Google Cloud Vision to label them based on their contents.

This is all a very, very early alpha at the moment, but I’m cautiously optimistic about progress so far.