Simon Willison’s Weblog

Subscribe
Atom feed for sqlite Random

461 posts tagged “sqlite”

2024

redka (via) Anton Zhiyanov’s new project to build a subset of Redis (including protocol support) using Go and SQLite. Also works as a Go library.

The guts of the SQL implementation are in the internal/sqlx folder.

# 14th April 2024, 3:21 pm / sqlite, anton-zhiyanov, go, redis

Optimizing SQLite for servers (via) Sylvain Kerkour's comprehensive set of lessons learned running SQLite for server-based applications.

There's a lot of useful stuff in here, including detailed coverage of the different recommended PRAGMA settings.

There was also a tip I haven't seen before about BEGIN IMMEDIATE transactions:

By default, SQLite starts transactions in DEFERRED mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.

The problem is that by upgrading a transaction after it has started, SQLite will immediately return a SQLITE_BUSY error without respecting the busy_timeout previously mentioned, if the database is already locked by another connection.

This is why you should start your transactions with BEGIN IMMEDIATE instead of only BEGIN. If the database is locked when the transaction starts, SQLite will respect busy_timeout.

# 31st March 2024, 8:16 pm / sqlite, sql, performance, databases, sqlite-busy

Building and testing C extensions for SQLite with ChatGPT Code Interpreter

Visit Building and testing C extensions for SQLite with ChatGPT Code Interpreter

I wrote yesterday about how I used Claude and ChatGPT Code Interpreter for simple ad-hoc side quests—in that case, for converting a shapefile to GeoJSON and merging it into a single polygon.

[... 4,612 words]

DiskCache (via) Grant Jenks built DiskCache as an alternative caching backend for Django (also usable without Django), using a SQLite database on disk. The performance numbers are impressive—it even beats memcached in microbenchmarks, due to avoiding the need to access the network.

The source code (particularly in core.py) is a great case-study in SQLite performance optimization, after five years of iteration on making it all run as fast as possible.

# 19th March 2024, 3:43 pm / sqlite, django, python, performance

How does SQLite store data? Michal Pitr explores the design of the SQLite on-disk file format, as part of building an educational implementation of SQLite from scratch in Go.

# 17th March 2024, 6:47 pm / sqlite

Astro DB. A new scale-to-zero hosted SQLite offering, described as “A fully-managed SQL database designed exclusively for Astro”. It’s built on top of LibSQL, the SQLite fork maintained by the Turso database team.

Astro DB encourages defining your tables with TypeScript, and querying them via the Drizzle ORM.

Running Astro locally uses a local SQLite database. Deployed to Astro Cloud switches to their DB product, where the free tier currently includes 1GB of storage, one billion row reads per month and one million row writes per month.

Astro itself is a “web framework for content-driven websites”—so hosted SQLite is a bit of an unexpected product from them, though it does broadly fit the ecosystem they are building.

This approach reminds me of how Deno K/V works—another local SQLite storage solution that offers a proprietary cloud hosted option for deployment.

# 12th March 2024, 6:02 pm / typescript, deno, sqlite, javascript

Datasette 1.0a10. The only changes in this alpha release concern the way Datasette handles database transactions. The database.execute_write_fn() internal method used to leave functions to implement transactions on their own—it now defaults to wrapping them in a transaction unless they opt out with the new transaction=False parameter.

In implementing this I found several places inside Datasette—in particular parts of the JSON write API—which had not been handling transactions correctly. Those are all now fixed.

# 18th February 2024, 5:10 am / projects, transactions, datasette, sqlite

wddbfs – Mount a sqlite database as a filesystem. Ingenious hack from Adam Obeng. Install this Python tool and run it against a SQLite database:

wddbfs --anonymous --db-path path/to/content.db

Then tell the macOS Finder to connect to Go -> Connect to Server -> http://127.0.0.1:8080/ (connect as guest) - connecting via WebDAV.

/Volumes/127.0.0.1/content.db will now be a folder full of CSV, TSV, JSON and JSONL files - one of each format for every table.

This means you can open data from SQLite directly in any application that supports that format, and you can even run CLI commands such as grep, ripgrep or jq directly against the data!

Adam used WebDAV because "Despite how clunky it is, this seems to be the best way to implement a filesystem given that getting FUSE support is not straightforward". What a neat trick.

# 18th February 2024, 3:31 am / sqlite, webdav, python, jq, cli

Announcing DuckDB 0.10.0. Somewhat buried in this announcement: DuckDB has Fixed-Length Arrays now, along with array_cross_product(a1, a2), array_cosine_similarity(a1, a2) and array_inner_product(a1, a2) functions.

This means you can now use DuckDB to find related content (and other tricks) using vector embeddings!

Also notable:

DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner, as attached databases are fully functional, appear just as regular tables, and can be updated in a safe, transactional manner.

# 13th February 2024, 5:57 pm / embeddings, sql, duckdb, databases, mysql, postgresql, sqlite

SQL for Data Scientists in 100 Queries. New comprehensive SQLite SQL tutorial from Greg Wilson, author of Teaching Tech Together and founder of The Carpentries.

# 6th February 2024, 11:08 pm / greg-wilson, sql, sqlite

stanchion (via) Dan Gallagher’s new (under-development) SQLite extension that adds column-oriented tables to SQLite, using a virtual table implemented in Zig that stores records in row groups, where each row group has multiple segments (one for each column) and those segments are stored as SQLite BLOBs.

I’m surprised that this is possible using the virtual table mechanism. It has the potential to bring some of the analytical querying performance we’ve seen in engines like DuckDB to SQLite itself.

# 31st January 2024, 10:32 pm / sqlite

SQLite 3.45. Released today. The big new feature is JSONB support, a new, specific-to-SQLite binary internal representation of JSON which can provide up to a 3x performance improvement for JSON-heavy operations, plus a 5-10% saving it terms of bytes stored on disk.

# 15th January 2024, 8:15 pm / sqlite, json, databases

2023

Release sqlite-utils-shell 0.3 — Interactive shell for sqlite-utils

Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle

Visit Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle

I’ve mainly been working on Datasette Enrichments and continuing to explore the possibilities enabled by sqlite-chronicle.

[... 1,123 words]

Release sqlite-utils 3.36 — Python CLI utility and library for manipulating SQLite databases
Release sqlite-utils 3.35.2 — Python CLI utility and library for manipulating SQLite databases

Tracking SQLite Database Changes in Git (via) A neat trick from Garrit Franke that I hadn’t seen before: you can teach “git diff” how to display human readable versions of the differences between binary files with a specific extension using the following:

git config diff.sqlite3.binary true
git config diff.sqlite3.textconv “echo .dump | sqlite3”

That way you can store binary files in your repo but still get back SQL diffs to compare them.

I still worry about the efficiency of storing binary files in Git, since I expect multiple versions of a text text file to compress together better.

# 1st November 2023, 6:53 pm / sqlite, git

SQLite 3.44: Interactive release notes. Anton Zhiyanov compiled interactive release notes for the new release of SQLite, demonstrating several of the new features. I’m most excited about order by in aggregates—group_concat(name order by name desc)—which is something I’ve wanted in the past. Anton demonstrates how it works with JSON aggregate functions as well. The new date formatting options look useful as well.

# 1st November 2023, 3:47 pm / anton-zhiyanov, sqlite

Release sqlite-migrate 0.1b0 — A simple database migration system for SQLite, based on sqlite-utils

New sqlite3 CLI tool in Python 3.12. The newly released Python 3.12 includes a SQLite shell, which you can open using “python -m sqlite3”—handy for when you’re using a machine that has Python installed but no sqlite3 binary.

I installed Python 3.12 for macOS using the official installer from Python.org and now “/usr/local/bin/python3 -m sqlite3” gives me a SQLite 3.41.1 shell—a pleasantly recent version from March 2023 (the latest SQLite is 3.43.1, released in September).

# 3rd October 2023, 6:57 pm / sqlite, python, cli

Weeknotes: the Datasette Cloud API, a podcast appearance and more

Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.

[... 1,243 words]

Batch size one billion: SQLite insert speedups, from the useful to the absurd (via) Useful, detailed review of ways to maximize the performance of inserting a billion integers into a SQLite database table.

# 26th September 2023, 5:31 pm / performance, sqlite

Upsert in SQL (via) Anton Zhiyanov is currently on a one-man quest to write detailed documentation for all of the fundamental SQL operations, comparing and contrasting how they work across multiple engines, generally with interactive examples.

Useful tips in here on why “insert... on conflict” is usually a better option than “insert or replace into” because the latter can perform a delete and then an insert, firing triggers that you may not have wanted to be fired.

# 25th September 2023, 8:34 pm / sqlite, sql, postgresql, databases

Geospatial SQL queries in SQLite using TG, sqlite-tg and datasette-sqlite-tg. Alex Garcia built sqlite-tg—a SQLite extension that uses the brand new TG geospatial library to provide a whole suite of custom SQL functions for working with geospatial data.

Here are my notes on trying out his initial alpha releases. The extension already provides tools for converting between GeoJSON, WKT and WKB, plus the all important tg_intersects() function for testing if a polygon or point overlap each other.

It’s pretty useful already. Without any geospatial indexing at all I was still able to get 700ms replies to a brute-force point-in-polygon query against 150MB of GeoJSON timezone boundaries stored as JSON text in a table.

# 25th September 2023, 7:45 pm / datasette, geospatial, sqlite, alex-garcia, geojson, tg

TG: Polygon indexing (via) TG is a brand new geospatial library by Josh Baker, author of the Tile38 in-memory spatial server (kind of a geospatial Redis). TG is written in pure C and delivered as a single C file, reminiscent of the SQLite amalgamation.

TG looks really interesting. It implements almost the exact subset of geospatial functionality that I find most useful: point-in-polygon, intersect, WKT, WKB, and GeoJSON—all with no additional dependencies.

The most interesting thing about it is the way it handles indexing. In this documentation Josh describes two approaches he uses to speeding up point-in-polygon and intersection using a novel approach that goes beyond the usual RTree implementation.

I think this could make the basis of a really useful SQLite extension—a lighter-weight alternative to SpatiaLite.

# 23rd September 2023, 4:32 am / c, sqlite, geojson, geospatial, spatialite, tg

Note that there have been no breaking changes since the [SQLite] file format was designed in 2004. The changes shows in the version history above have all be one of (1) typo fixes, (2) clarifications, or (3) filling in the "reserved for future extensions" bits with descriptions of those extensions as they occurred.

D. Richard Hipp

# 18th September 2023, 6:02 pm / d-richard-hipp, sqlite

Introducing datasette-litestream: easy replication for SQLite databases in Datasette. We use Litestream on Datasette Cloud for streaming backups of user data to S3. Alex Garcia extracted out our implementation into a standalone Datasette plugin, which bundles the Litestream Go binary (for the relevant platform) in the package you get when you run “datasette install datasette-litestream”—so now Datasette has a very robust answer to questions about SQLite disaster recovery beyond just the Datasette Cloud platform.

# 13th September 2023, 7:28 pm / datasette-cloud, sqlite, plugins, datasette, alex-garcia, litestream

Release sqlite-utils 3.35.1 — Python CLI utility and library for manipulating SQLite databases

LLM now provides tools for working with embeddings

Visit LLM now provides tools for working with embeddings

LLM is my Python library and command-line tool for working with language models. I just released LLM 0.9 with a new set of features that extend LLM to provide tools for working with embeddings.

[... 3,521 words]

Release sqlite-migrate 0.1a2 — A simple database migration system for SQLite, based on sqlite-utils