Simon Willison’s Weblog

Subscribe
Atom feed for zstd

4 items tagged “zstd”

2024

Optimizing Large-Scale OpenStreetMap Data with SQLite (via) JT Archie describes his project to take 9GB of compressed OpenStreetMap protobufs data for the whole of the United States and load it into a queryable SQLite database.

OSM tags are key/value pairs. The trick used here for FTS-accelerated tag queries is really neat: build a SQLite FTS table containing the key/value pairs as space concatenated text, then run queries that look like this:

SELECT
    id
FROM
    entries e
    JOIN search s ON s.rowid = e.id
WHERE
    -- use FTS index to find subset of possible results
    search MATCH 'amenity cafe'
    -- use the subset to find exact matches
    AND tags->>'amenity' = 'cafe';

JT ended up building a custom SQLite Go extension, SQLiteZSTD, to further accelerate things by supporting queries against read-only zstd compresses SQLite files. Apparently zstd has a feature that allows "compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed", which works well with SQLite's page format.

# 2nd July 2024, 2:33 pm / go, openstreetmap, sqlite, zstd

2023

Queryable Logging with Blacklite (via) Will Sargent describes how he built Blacklite, a Java library for diagnostic logging that writes log events (as zstd compressed JSON objects) to a SQLite database and maintains 5,000 entries in a “live” database while entries beyond that range are cycled out to an archive.db file, which is cycled to archive.timestamp.db when it reaches 500,000 items.

Lots of interesting notes here on using SQLite for high performance logging.

“SQLite databases are also better log files in general. Queries are faster than parsing through flat files, with all the power of SQL. A vacuumed SQLite database is only barely larger than flat file logs. They are as easy to store and transport as flat file logs, but work much better when merging out of order or interleaved data between two logs.”

# 21st August 2023, 6:13 pm / java, logging, sqlite, zstd

Introducing sqlite-xsv: The Fastest CSV Parser for SQLite. Alex Garcia continues to push the boundaries of SQLite extensions. This new extension in Rust wraps the lightning fast Rust csv crate and provides a new csv_reader() virtual table that can handle regular, gzipped and zstd compressed files.

# 14th January 2023, 9:54 pm / csv, sqlite, rust, alex-garcia, zstd

2022

sqlite-zstd: Transparent dictionary-based row-level compression for SQLite. Interesting SQLite extension from phiresky, the author of that amazing SQLite WASM hack from a while ago which could fetch subsets of a large SQLite database using the HTTP range header. This extension, written in Rust, implements row-level compression for a SQLite table by creating compression dictionaries for larger chunks of the table, providing better results than just running compression against each row value individually.

# 9th August 2022, 9:23 pm / sqlite, rust, zstd