Simon Willison’s Weblog

Subscribe

Optimizing Large-Scale OpenStreetMap Data with SQLite (via) JT Archie describes his project to take 9GB of compressed OpenStreetMap protobufs data for the whole of the United States and load it into a queryable SQLite database.

OSM tags are key/value pairs. The trick used here for FTS-accelerated tag queries is really neat: build a SQLite FTS table containing the key/value pairs as space concatenated text, then run queries that look like this:

SELECT
    id
FROM
    entries e
    JOIN search s ON s.rowid = e.id
WHERE
    -- use FTS index to find subset of possible results
    search MATCH 'amenity cafe'
    -- use the subset to find exact matches
    AND tags->>'amenity' = 'cafe';

JT ended up building a custom SQLite Go extension, SQLiteZSTD, to further accelerate things by supporting queries against read-only zstd compresses SQLite files. Apparently zstd has a feature that allows "compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed", which works well with SQLite's page format.