Simon Willison’s Weblog

Subscribe

Items tagged databases, sql

Filters: databases × sql × Sorted by date


Optimizing SQLite for servers (via) Sylvain Kerkour’s comprehensive set of lessons learned running SQLite for server-based applications.

There’s a lot of useful stuff in here, including detailed coverage of the different recommended PRAGMA settings.

There was also a tip I haven’t seen before about “BEGIN IMMEDIATE” transactions:

“By default, SQLite starts transactions in DEFERRED mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.

The problem is that by upgrading a transaction after it has started, SQLite will immediately return a SQLITE_BUSY error without respecting the busy_timeout previously mentioned, if the database is already locked by another connection.

This is why you should start your transactions with BEGIN IMMEDIATE instead of only BEGIN. If the database is locked when the transaction starts, SQLite will respect busy_timeout.” # 31st March 2024, 8:16 pm

Endatabas (via) Endatabas is “an open source immutable database”—also described as “SQL document database with full history”.

It uses a variant of SQL which allows you to insert data into tables that don’t exist yet (they’ll be created automatically) then run standard select queries, joins etc. It maintains a full history of every record and supports the recent SQL standard “FOR SYSTEM_TIME AS OF” clause for retrieving historical records as they existed at a specified time (it defaults to the most recent versions).

It’s written in Common Lisp plus a bit of Rust, and includes Docker images for running the server and client libraries in JavaScript and Python. The on-disk storage format is Apache Arrow, the license is AGPL and it’s been under development for just over a year.

It’s also a document database: you can insert JSON-style nested objects directly into a table, and query them with path expressions like “select users.friends[1] from users where id = 123;”

They have a WebAssembly version and a nice getting started tutorial which you can try out directly in your browser.

Their “Why?” page lists full history, time travel queries, separation of storage from compute, schemaless tables and columnar storage as the five pillars that make up their product. I think it’s a really interesting amalgamation of ideas. # 1st March 2024, 4:28 am

Announcing DuckDB 0.10.0. Somewhat buried in this announcement: DuckDB has Fixed-Length Arrays now, along with array_cross_product(a1, a2), array_cosine_similarity(a1, a2) and array_inner_product(a1, a2) functions.

This means you can now use DuckDB to find related content (and other tricks) using vector embeddings!

Also notable: “DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner, as attached databases are fully functional, appear just as regular tables, and can be updated in a safe, transactional manner.” # 13th February 2024, 5:57 pm

Upsert in SQL (via) Anton Zhiyanov is currently on a one-man quest to write detailed documentation for all of the fundamental SQL operations, comparing and contrasting how they work across multiple engines, generally with interactive examples.

Useful tips in here on why “insert... on conflict” is usually a better option than “insert or replace into” because the latter can perform a delete and then an insert, firing triggers that you may not have wanted to be fired. # 25th September 2023, 8:34 pm

Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2171 words]

Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew). # 1st June 2020, 10:11 pm

I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.

Edgar F. Codd # 7th March 2020, 11:12 pm

github/gh-ost: Thoughts on Foreign Keys? The biggest challenge I’ve seen with foreign key constraints at scale (at least with MySQL) is how they conflict with online schema migrations using tools like pt-online-schema-change or GitHub’s gh-ost. This is a good explanation of the issue by Shlomi Noach, one of the gh-ost maintainers. # 19th June 2018, 4:12 pm

What are the key insights in mastering SQL queries?

You may find this article useful (despite the list-o-matic name): 10 Easy Steps to a Complete Understanding of SQL—I’ve been using SQL for years but I found that some of the concepts explained there helped firm up my fundamental understanding of how to use it effectively.

[... 64 words]

What tools and techniques are used for relational database version control (structure and data)?

The term you are looking for is database migrations (sometimes called database change scripts).

[... 308 words]

grant XXX on * ? (via) PostgreSQL doesn’t have a way to say “this user is allowed to select/update/etc on all tables in database X”. That kind of sucks. UPDATE: This is fixed in PostgreSQL 9, see the comments. # 16th March 2010, 6:26 pm

SELECT * FROM everything, or why databases are awesome. I’m beginning to think that for scalable applications the thinner your ORM is the better—if you even use one at all. # 22nd June 2007, 12:40 am