Simon Willison’s Weblog

Subscribe

Items tagged sqlite in Apr, 2024

Filters: Year: 2024 × Month: Apr × sqlite × Sorted by date


Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records—all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.

So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That’s not very much, considering the size, complexity, and portability costs involved.

D. Richard Hipp # 30th April 2024, 1:59 pm

Why SQLite Uses Bytecode (via) Brand new SQLite architecture documentation by D. Richard Hipp explaining the trade-offs between a bytecode based query plan and a tree of objects.

SQLite uses the bytecode approach, which provides an important characteristic that SQLite can very easily execute queries incrementally—stopping after each row, for example. This is more useful for a local library database than for a network server where the assumption is that the entire query will be executed before results are returned over the wire. # 30th April 2024, 5:32 am

We [Bluesky] took a somewhat novel approach of giving every user their own SQLite database. By removing the Postgres dependency, we made it possible to run a ‘PDS in a box’ [Personal Data Server] without having to worry about managing a database. We didn’t have to worry about things like replicas or failover. For those thinking this is irresponsible: don’t worry, we are backing up all the data on our PDSs!

SQLite worked really well because the PDS – in its ideal form – is a single-tenant system. We owned up to that by having these single tenant SQLite databases.

Daniel Holmgren # 23rd April 2024, 7 pm

qrank (via) Interesting and very niche project by Colin Dellow.

Wikidata has pages for huge numbers of concepts, people, places and things.

One of the many pieces of data they publish is QRank—“ranking Wikidata entities by aggregating page views on Wikipedia, Wikispecies, Wikibooks, Wikiquote, and other Wikimedia projects”. Every item gets a score and these scores can be used to answer questions like “which island nations get the most interest across Wikipedia”—potentially useful for things like deciding which labels to display on a highly compressed map of the world.

QRank is published as a gzipped CSV file.

Colin’s hikeratlas/qrank GitHub repository runs weekly, fetches the latest qrank.csv.gz file and loads it into a SQLite database using SQLite’s “.import” mechanism. Then it publishes the resulting SQLite database as an asset attached to the “latest” GitHub release on that repo—currently a 307MB file.

The database itself has just a single table mapping the Wikidata ID (a primary key integer) to the latest QRank—another integer. You’d need your own set of data with Wikidata IDs to join against this to do anything useful.

I’d never thought of using GitHub Releases for this kind of thing. I think it’s a really interesting pattern. # 21st April 2024, 10:28 pm

redka (via) Anton Zhiyanov’s new project to build a subset of Redis (including protocol support) using Go and SQLite. Also works as a Go library.

The guts of the SQL implementation are in the internal/sqlx folder. # 14th April 2024, 3:21 pm