Notes from my appearance on the Changelog podcast
After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!
A transcript should be available soon (the Changelog have excellent transcripts) but in the meantime here are my own notes on the topics that we covered.
My section starts 54:45 minutes into the episode.
From the intro
Datasette was inspired by my work at the Guardian with Simon Rogers, where we started the The Guardian Data Blog.
Here’s Datasette on GitHub, and Datasette: instantly create and publish an API for your SQLite databases where I first announced the project.
My sf-trees.com website, which is powered by this Datasette API using data from the San Francisco Department of Public Works.
The hosting platform I use for immutable deployments of Docker containers is Zeit Now.
The Datasette tool suite
Here’s the full Datasette suite of tools:
- csvs-to-sqlite is a command-line tool for converting CSV files into a SQLite database.
- Datasette is a web application for exploring SQLite databases and interacting with them via a read-only JSON API. It also includes the
datasette publishcommand-line tool for publishing those databases using Zeit Now or Heroku.
- Datasette Publish is a web app that lets you upload CSV files and deploy them to a new Datasette instance without installing any software. Here’s my announcement post.
We talked a lot about SQLite. Here’s a document describing How SQLite is tested and a bunch more about SQLite from my blog.
A few more fun examples of datasettes that we discussed:
- USGS Polar Bear ear tags on a map, using the datasette cluster map plugin—data sourced from the USGS. The tags are made by Wildlife Computers.
- Australian dog names! Here are the most common names overall, and here are the most common names for pugs.
- The UK Register of Members Interests lets you explore conflicts of interest registered by UK politicians. My accompanying blog entry has plenty of interesting example searches.
- The Metropolitan Museum of Art—a Datasette instance for exploring the 464,360 items in their collection that they published as a CSV on GitHub.
Tools for manipulating data
I’m a huge fan of the combination of Jupyter notebooks and Pandas. Here’s a collection of notes on jupyter and pandas from my blog.
Contributing to Datasette
The best thing you can do is use Datasette to publish your data and then give me feedback!
If you want to get involved in the code, take a look at the help wanted label on GitHub or have a look at the open issues for csvs-to-sqlite.
Even better: write plugins! Datasette’s plugin architecture is just getting started so the more experience and feedback we can get with it the better.
And finally… I mentioned US Navy Airship and Zeppelin photos. I also have a Pinterest Board.
More recent articles
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023