Optimizing Large-Scale OpenStreetMap Data with SQLite (via) JT Archie describes his project to take 9GB of compressed OpenStreetMap protobufs data for the whole of the United States and load it into a queryable SQLite database.
OSM tags are key/value pairs. The trick used here for FTS-accelerated tag queries is really neat: build a SQLite FTS table containing the key/value pairs as space concatenated text, then run queries that look like this:
SELECT
id
FROM
entries e
JOIN search s ON s.rowid = e.id
WHERE
-- use FTS index to find subset of possible results
search MATCH 'amenity cafe'
-- use the subset to find exact matches
AND tags->>'amenity' = 'cafe';
JT ended up building a custom SQLite Go extension, SQLiteZSTD, to further accelerate things by supporting queries against read-only zstd compresses SQLite files. Apparently zstd has a feature that allows "compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed", which works well with SQLite's page format.
Recent articles
- Weeknotes: a livestream, a surprise keynote and progress on Datasette Cloud billing - 2nd July 2024
- Open challenges for AI engineering - 27th June 2024
- Building search-based RAG using Claude, Datasette and Val Town - 21st June 2024
- Weeknotes: Datasette Studio and a whole lot of blogging - 19th June 2024
- Language models on the command-line - 17th June 2024
- A homepage redesign for my blog's 22nd birthday - 12th June 2024