Simon Willison’s Weblog


97 items tagged “databases”


A write-ahead log is not a universal part of durability (via) Phil Eaton uses pseudo code to provide a clear description of how write-ahead logs in transactional database systems work, useful for understanding the tradeoffs they make and the guarantees they can provided.

I particularly liked the pseudo code explanation of group commits, where clients block waiting for their commit to be acknowledged as part of a batch of writes flushed to disk.

# 1st July 2024, 3:05 pm / databases, phil-eaton

DuckDB 1.0 (via) Six years in the making. The most significant feature in this milestone is stability of the file format: previous releases often required files to be upgraded to work with the new version.

This release also aspires to provide stability for both the SQL dialect and the C API, though these may still change with sufficient warning in the future.

# 3rd June 2024, 1:23 pm / sql, duckdb, databases

Optimizing SQLite for servers (via) Sylvain Kerkour’s comprehensive set of lessons learned running SQLite for server-based applications.

There’s a lot of useful stuff in here, including detailed coverage of the different recommended PRAGMA settings.

There was also a tip I haven’t seen before about “BEGIN IMMEDIATE” transactions:

“By default, SQLite starts transactions in DEFERRED mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.

The problem is that by upgrading a transaction after it has started, SQLite will immediately return a SQLITE_BUSY error without respecting the busy_timeout previously mentioned, if the database is already locked by another connection.

This is why you should start your transactions with BEGIN IMMEDIATE instead of only BEGIN. If the database is locked when the transaction starts, SQLite will respect busy_timeout.”

# 31st March 2024, 8:16 pm / sqlite, sql, performance, databases

How Figma’s databases team lived to tell the scale (via) The best kind of scaling war story:

"Figma’s database stack has grown almost 100x since 2020. [...] In 2020, we were running a single Postgres database hosted on AWS’s largest physical instance, and by the end of 2022, we had built out a distributed architecture with caching, read replicas, and a dozen vertically partitioned databases."

I like the concept of "colos", their internal name for sharded groups of related tables arranged such that those tables can be queried using joins.

Also smart: separating the migration into "logical sharding" - where queries all still run against a single database, even though they are logically routed as if the database was already sharded - followed by "physical sharding" where the data is actually copied to and served from the new database servers.

Logical sharding was implemented using PostgreSQL views, which can accept both reads and writes:

CREATE VIEW table_shard1 AS SELECT * FROM table WHERE hash(shard_key) >= min_shard_range AND hash(shard_key) < max_shard_range)

The final piece of the puzzle was DBProxy, a custom PostgreSQL query proxy written in Go that can parse the query to an AST and use that to decide which shard the query should be sent to. Impressively it also has a scatter-gather mechanism, so select * from table can be sent to all shards at once and the results combined back together again.

# 14th March 2024, 9:23 pm / scaling, postgresql, sharding, databases

S3 is files, but not a filesystem (via) Cal Paterson helps some concepts click into place for me: S3 imitates a file system but has a number of critical missing features, the most important of which is the lack of partial updates. Any time you want to modify even a few bytes in a file you have to upload and overwrite the entire thing. Almost every database system is dependent on partial updates to function, which is why there are so few databases that can use S3 directly as a backend storage mechanism.

# 10th March 2024, 11:47 am / s3, aws, databases

Endatabas (via) Endatabas is “an open source immutable database”—also described as “SQL document database with full history”.

It uses a variant of SQL which allows you to insert data into tables that don’t exist yet (they’ll be created automatically) then run standard select queries, joins etc. It maintains a full history of every record and supports the recent SQL standard “FOR SYSTEM_TIME AS OF” clause for retrieving historical records as they existed at a specified time (it defaults to the most recent versions).

It’s written in Common Lisp plus a bit of Rust, and includes Docker images for running the server and client libraries in JavaScript and Python. The on-disk storage format is Apache Arrow, the license is AGPL and it’s been under development for just over a year.

It’s also a document database: you can insert JSON-style nested objects directly into a table, and query them with path expressions like “select users.friends[1] from users where id = 123;”

They have a WebAssembly version and a nice getting started tutorial which you can try out directly in your browser.

Their “Why?” page lists full history, time travel queries, separation of storage from compute, schemaless tables and columnar storage as the five pillars that make up their product. I think it’s a really interesting amalgamation of ideas.

# 1st March 2024, 4:28 am / lisp, sql, databases

Announcing DuckDB 0.10.0. Somewhat buried in this announcement: DuckDB has Fixed-Length Arrays now, along with array_cross_product(a1, a2), array_cosine_similarity(a1, a2) and array_inner_product(a1, a2) functions.

This means you can now use DuckDB to find related content (and other tricks) using vector embeddings!

Also notable: “DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner, as attached databases are fully functional, appear just as regular tables, and can be updated in a safe, transactional manner.”

# 13th February 2024, 5:57 pm / embeddings, sql, duckdb, databases, mysql, postgresql, sqlite

Before we even started writing the database, we first wrote a fully-deterministic event-based network simulation that our database could plug into. This system let us simulate an entire cluster of interacting database processes, all within a single-threaded, single-process application, and all driven by the same random number generator. We could run this virtual cluster, inject network faults, kill machines, simulate whatever crazy behavior we wanted, and see how it reacted. Best of all, if one particular simulation run found a bug in our application logic, we could run it over and over again with the same random seed, and the exact same series of events would happen in the exact same order. That meant that even for the weirdest and rarest bugs, we got infinity “tries” at figuring it out, and could add logging, or do whatever else we needed to do to track it down.

[...] At FoundationDB, once we hit the point of having ~zero bugs and confidence that any new ones would be found immediately, we entered into this blessed condition and we flew.

[...] We had built this sophisticated testing system to make our database more solid, but to our shock that wasn’t the biggest effect it had. The biggest effect was that it gave our tiny engineering team the productivity of a team 50x its size.

Will Wilson, on FoundationDB

# 13th February 2024, 5:20 pm / testing, databases

SQLite 3.45. Released today. The big new feature is JSONB support, a new, specific-to-SQLite binary internal representation of JSON which can provide up to a 3x performance improvement for JSON-heavy operations, plus a 5-10% saving it terms of bytes stored on disk.

# 15th January 2024, 8:15 pm / sqlite, json, databases


Stripe: Online migrations at scale (via) This 2017 blog entry from Jacqueline Xu at Stripe provides a very clear description of the “dual writes” pattern for applying complex data migrations without downtime: dual write to new and old tables, update the read paths, update the write paths and finally remove the now obsolete data—illustrated with an example of upgrading customers from having a single to multiple subscriptions.

# 5th November 2023, 4:06 pm / migrations, databases, stripe, zero-downtime

Database Migrations. Vadim Kravcenko provides a useful, in-depth description of the less obvious challenges of applying database migrations successfully. Vadim uses and likes Django’s migrations (as do I) but notes that running them at scale still involves a number of thorny challenges.

The biggest of these, which I’ve encountered myself multiple times, is that if you want truly zero downtime deploys you can’t guarantee that your schema migrations will be deployed at the exact same instant as changes you make to your application code.

This means all migrations need to be forward-compatible: you need to apply a schema change in a way that your existing code will continue to work error-free, then ship the related code change as a separate operation.

Vadim describes what this looks like in detail for a number of common operations: adding a field, removing a field and changing a field that has associated business logic implications. He also discusses the importance of knowing when to deploy a dual-write strategy.

# 1st October 2023, 11:55 pm / django, migrations, databases, ops, zero-downtime

Upsert in SQL (via) Anton Zhiyanov is currently on a one-man quest to write detailed documentation for all of the fundamental SQL operations, comparing and contrasting how they work across multiple engines, generally with interactive examples.

Useful tips in here on why “insert... on conflict” is usually a better option than “insert or replace into” because the latter can perform a delete and then an insert, firing triggers that you may not have wanted to be fired.

# 25th September 2023, 8:34 pm / sqlite, sql, postgresql, databases


JSON Changelog with SQLite (via) One of my favourite database challenges is how to track changes to rows over time. This is a neat recipe from 2018 which uses SQLite triggers and the SQLite JSON functions to serialize older versions of the rows and store them in TEXT columns.

# 16th November 2022, 3:41 am / sqlite, json, databases

Querying Postgres Tables Directly From DuckDB (via) I learned a lot of interesting PostgreSQL tricks from this write-up of the new DuckDB feature that allows it to run queries against PostgreSQL servers directly. It works using COPY (SELECT ...) TO STDOUT (FORMAT binary) which writes rows to the protocol stream in efficient binary format, but splits the table being read into parallel fetches against page ranges and uses SET TRANSACTION SNAPSHOT ... in those parallel queries to ensure they see the same transactional snapshot of the database.

# 3rd October 2022, 2:27 pm / postgresql, duckdb, databases

Introducing LiteFS (via) LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

# 21st September 2022, 6:56 pm / ben-johnson, fly, sqlite, litestream, replication, databases

How the SQLite Virtual Machine Works. The latest entry in Ben Johnson’s series about SQLite internals.

# 7th September 2022, 8:49 pm / ben-johnson, sqlite, databases

Notes on the SQLite DuckDB paper

SQLite: Past, Present, and Future is a newly published paper authored by Kevin P. Gaffney, Martin Prammer and Jignesh M. Patel from the University of Wisconsin-Madison and D. Richard Hipp, Larry Brasfield and Dan Kennedy from the core SQLite engineering team.

[... 1,021 words]

How SQLite Scales Read Concurrency (via) Ben Johnson’s series on SQLite internals continues—this time with a detailed explanation of how the SQLite WAL (Write-Ahead Log) is implemented.

# 24th August 2022, 4:16 pm / ben-johnson, sqlite, databases

Turning SQLite into a distributed database (via) Heyang Zhou introduces mvSQLite, his brand new open source “SQLite-compatible distributed database” built in Rust on top of Apple’s FoundationDB. This is a very promising looking new entry into the distributed/replicated SQLite space: FoundationDB was designed to provide low-level primitives that tools like this could build on top of.

# 21st August 2022, 5:40 pm / sqlite, rust, databases

SQLite Internals: Pages & B-trees (via) Ben Johnson provides a delightfully clear introduction to SQLite internals, describing the binary format used to store rows on disk and how SQLite uses 4KB pages for both row storage and for the b-trees used to look up records.

# 27th July 2022, 2:57 pm / sqlite, algorithms, databases, ben-johnson

Sqitch tutorial for SQLite (via) Sqitch is an interesting implementation of database migrations: it’s a command-line tool written in Perl with an interface similar to Git, providing commands to create, run, revert and track migration scripts. The scripts the selves are written as SQL in whichever database engine you are using. The tutorial for SQLite gives a good idea as to how the whole system works.

# 24th July 2022, 11:44 pm / sqlite, migrations, databases

Soft Deletion Probably Isn’t Worth It. Brandur argues that soft deletion—where you delete records by populating a “is_deleted” or “deleted_at” column in your table—isn’t worth the additional complexity and risk it adds to other database queries. Instead, he suggests having a separate deleted records table which records the deleted data in a JSON blob—allowing you to review and recover it manually if necessary, and giving you an easy way to expire deleted records that have exceeded your retention policy.

# 19th July 2022, 8:40 pm / brandur-leach, databases

SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”.

# 4th May 2022, 10:40 pm / databases, xml, archives

Postgres Auditing in 150 lines of SQL (via) I’ve run up against the problem of tracking changes made to rows within a database table so many times, and I still don’t have a preferred solution. This approach to it looks very neat: it uses PostgreSQL triggers to populate a single audit table (as opposed to one audit table per tracked table) and records the previous and current column values for the row using jsonb.

# 9th March 2022, 7:19 pm / postgresql, databases

migra (via) This looks like a very handy tool to have around: run “migra postgresql:///a postgresql:///b” and it will detect and output the SQL alter statements needed to modify the first PostgreSQL database schema to match the second. It’s written in Python, running on top of SQLAlchemy.

# 26th February 2022, 11:23 pm / migrations, postgresql, databases


toyDB: references. toyDB is a “distributed SQL database in Rust, written as a learning project”, with its own implementations of SQL, raft, ACID transactions, B+trees and more. toyDB author Erik Grinaker has assembled a detailed set of references that he used to learn how to build a database—I’d love to see more projects do this, it’s really useful.

# 19th July 2021, 12:18 am / rust, databases

PostgreSQL: nbtree/README (via) The PostgreSQL source tree includes beatifully written README files for different parts of PostgreSQL. Here’s the README for their btree implementation—it continues to be actively maintained (last change was is March) and “git blame” shows that parts of the file date back 25 years, to 1996!

# 25th June 2021, 6:09 pm / computerscience, postgresql, databases

Django SQL Dashboard

Visit Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2,171 words]

Query Engines: Push vs. Pull (via) Justin Jaffray (who has worked on Materialize) explains the difference between push and pull query execution engines using some really clear examples built around JavaScript generators.

# 2nd May 2021, 2:49 am / generators, databases, javascript

In general, relying only on natural keys is a nightmare. Double nightmare if it's PII. Natural keys only work if you are flawlessly omniscient about the domain. And you aren't.

Jacques Chester

# 11th April 2021, 10:48 pm / databases