Notes from my appearance on the Changelog podcast

9th May 2018

After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!

A transcript should be available soon (the Changelog have excellent transcripts) but in the meantime here are my own notes on the topics that we covered.

My section starts 54:45 minutes into the episode.

From the intro

Datasette was inspired by my work at the Guardian with Simon Rogers, where we started the The Guardian Data Blog.

Here’s Datasette on GitHub, and Datasette: instantly create and publish an API for your SQLite databases where I first announced the project.

My sf-trees.com website, which is powered by this Datasette API using data from the San Francisco Department of Public Works.

The hosting platform I use for immutable deployments of Docker containers is Zeit Now.

The Datasette tool suite

Here’s the full Datasette suite of tools:

csvs-to-sqlite is a command-line tool for converting CSV files into a SQLite database.
Datasette is a web application for exploring SQLite databases and interacting with them via a read-only JSON API. It also includes the datasette publish command-line tool for publishing those databases using Zeit Now or Heroku.
Datasette Publish is a web app that lets you upload CSV files and deploy them to a new Datasette instance without installing any software. Here’s my announcement post.

We talked a lot about SQLite. Here’s a document describing How SQLite is tested and a bunch more about SQLite from my blog.

Datasette examples

A few more fun examples of datasettes that we discussed:

USGS Polar Bear ear tags on a map, using the datasette cluster map plugin—data sourced from the USGS. The tags are made by Wildlife Computers.
Australian dog names! Here are the most common names overall, and here are the most common names for pugs.
The UK Register of Members Interests lets you explore conflicts of interest registered by UK politicians. My accompanying blog entry has plenty of interesting example searches.
The Metropolitan Museum of Art—a Datasette instance for exploring the 464,360 items in their collection that they published as a CSV on GitHub.

Tools for manipulating data

I’m a huge fan of the combination of Jupyter notebooks and Pandas. Here’s a collection of notes on jupyter and pandas from my blog.

Contributing to Datasette

The best thing you can do is use Datasette to publish your data and then give me feedback!

If you want to get involved in the code, take a look at the help wanted label on GitHub or have a look at the open issues for csvs-to-sqlite.

Even better: write plugins! Datasette’s plugin architecture is just getting started so the more experience and feedback we can get with it the better.

And finally… I mentioned US Navy Airship and Zeppelin photos. I also have a Pinterest Board.

Posted 9th May 2018 at 6:03 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog