datasette-clone
14th April 2020
I released a fun little Datasette utility today: datasette-clone.
It’s a command-line tool for cloning a Datasette instance down to your local hard drive—the name is inspired by the git clone command.
Here’s how to use it to create a local clone of all of the data on covid-19.datasettes.com (discussed previously):
pip install datasette-clone
datasette-clone https://covid-19.datasettes.com/
If you give the command the URL to a public Datasette instance it will iterate through the list of available SQLite database files (by hitting the /-/databases.json endpoint) and download each of them.
You can give it an optional second argument for the directory you would like to store the data in:
datasette-clone https://covid-19.datasettes.com/ covid-19
Add -v to see debugging output showing what it’s doing.
The tool also pulls a copy of that databases.json file, and stores it alongside the downloaded database files. That file looks something like this:
[
{
"name": "covid",
"path": "covid.db",
"size": 12038144,
"is_mutable": false,
"is_memory": false,
"hash": "453e1090ca379bde05d86c2db35f80235a58a2b52c92dda4463c25f6b3a9211d"
}
]
That "hash" key is the sha256 hash of the file contents. The next time you run the datasette-clone command it will compare the cached databases.json file with the live one, and only download database files that have changed.
In this way, datasette-clone can be easily used to maintain a mirror of any public Datasette instance that you find interesting.
I built the command with the intention of using it in a GitHub Action: I’m increasingly using Actions to generate or update databases, and I often find myself wanting to download the previous database copy, update it in some way and then deploy the result.
My plan was to use datasette-clone in conjunction with the actions/cache action to cache copies of the database files locally (actions have a 5GB cache storage limit) and make my download step more efficient.
Unfortunately it turns out that doesn’t work for most of my projects, because actions/cache currenly only works for push and pull_request events, and most of my repositories are updated by scheduled workflows!
Hopefully they’ll fix that limitation at some point in the future. In the meantime, datasette-clone is still a useful tool for creating clones of public Datasette instances for other reasons.
Update, 23rd June 2020: They fixed that limitation in actions/cache@v2.
More recent articles
- Hacking the WiFi-enabled color screen GitHub Universe conference badge - 28th October 2025
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025
- Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas - 22nd October 2025