Cross-database queries in SQLite (and weeknotes)
I released Datasette 0.55 and sqlite-utils 3.6 this week with a common theme across both releases: supporting cross-database joins.
Cross-database queries in Datasette
SQLite databases are single files on disk. I really love this characteristic—it makes them easy to create, copy and move around. All you need is a disk volume you can create as many SQLite databases as you like.
A lesser known feature of SQLite is that you can run queries, including joins, across tables from more than one database. The secret sauce is the ATTACH DATABASE command. Run the following SQL:
ATTACH 'other.db' AS other;
And now you can reference tables in that database as
other.tablename. You can then join against them, combine them with
UNION and generally treat them as if they were another table in your first connected database.
I’ve wanted to add support for cross-database queries to Datasette since May 2018. It took me quite a while to settle on a design—SQLite defaults to only allowing ten databases to be attached together, and I needed to figure out how multiple connected databases would fit with the design of the rest of Datasette.
In the end, I decided on the simplest option that would unlock the feature. Run Datasette with the new
--crossdb option and the first ten databases passed to Datasette will be ATTACHed to an in-memory database available at the
latest.datasette.io demo now exposes two databases using this feature. Here’s an illustrative example query that performs a UNION across the
sqlite_master metadata table in two databases:
select 'fixtures' as database, * from [fixtures].sqlite_master union select 'extra_database' as database, * from [extra_database].sqlite_master
Cross-database queries in sqlite-utils
sqlite-utils offers both a Python library and a command-line utility in one package. I’ve added
ATTACH support to both.
The Python library support looks like this:
db = Database("first.db") db.attach("second", "second.db") # Now you can run queries like this: cursor = db.execute(""" select * from table_in_first union all select * from second.table_in_second """) print(cursor.fetchall())
The command-line tool now has a new --attach option which lets you attach a database using an alias. The equivalent query to the above would look like this:
$ sqlite-utils first.db --attach second second.db ' select * from table_in_first union all select * from second.table_in_second'
This defaults to returning results as a JSON array, but you can add
--tsv or other options to get the results back in different output formats.
A cosmetic upgrade to tags on my blog
I noticed that Will Larson’s blog shows little numbers next to the tags indicating how many times they have been used. I really liked that, so I’ve implemented it here as well.
Each entry (and quotation and link) now gets a block in the sidebar that looks like this:
As a long-time fan of faceted search interfaces I really like this upgrade—it helps indicate at a glance the kind of content I have stashed away in my blog’s archive.
Releases this week
Preview of new JSON default format for Datasette
Python CLI utility and library for manipulating SQLite databases
An open source multi-tool for exploring and publishing data
Datasette plugin providing an automatic GraphQL API for your SQLite databases
Functions for finding numbers using higher/lower
Download map tiles and store them in an MBTiles database
TIL this week
More recent articles
- AI-enhanced development makes me more ambitious with my projects - 27th March 2023
- I built a ChatGPT plugin to answer questions about data hosted in Datasette - 24th March 2023
- Weeknotes: AI won't slow down, a new newsletter and a huge Datasette refactor - 22nd March 2023
- Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - 22nd March 2023
- A conversation about prompt engineering with CBC Day 6 - 18th March 2023
- Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - 17th March 2023