Simon Willison’s Weblog

194 items tagged “sqlite”

2023

Weeknotes: A bunch of things I learned this week, plus datasette-explain

The Datasette table view refactor, JSON redesign and ?_extra= continues this week, mainly in this ongoing pull request and this tracking issue.

[... 1499 words]

Many people, and even a few companies, have contributed code to SQLite over the years. I have legal documentation for all such contributions in the firesafe in my office. We are able to track every byte of the SQLite source code back to its original creator. The project has been and continues to be open to outside contributions, as long as those contributions meet high standards of provenance and maintainability.

D. Richard Hipp # 8th February 2023, 6:07 pm

Making SQLite extensions pip install-able (via) Alex Garcia figured out how to bundle a compiled SQLite extension in a Python wheel (building different wheels for different platforms) and publish them to PyPI. This is a huge leap forward in terms of the usability of SQLite extensions, which have previously been pretty difficult to actually install and run. Alex also created Datasette plugins that depend on his packages, so you can now “datasette install datasette-sqlite-regex” (or datasette-sqlite-ulid, datasette-sqlite-fastrand, datasette-sqlite-jsonschema) to gain access to his custom SQLite extensions in your Datasette instance. It even works with “datasette publish --install” to deploy to Vercel, Fly.io and Cloud Run. # 6th February 2023, 7:44 pm

sqlite-jsonschema. “A SQLite extension for validating JSON objects with JSON Schema”, building on the jsonschema Rust crate. SQLite and JSON are already a great combination—Alex suggests using this extension to implement check constraints to validate JSON columns before inserting into a table, or just to run queries finding existing data that doesn’t match a given schema. # 28th January 2023, 3:50 am

sqlite-ulid. Alex Garcia’s sqlite-ulid adds lightning-fast SQL functions for generating ULIDs—Universally Unique Lexicographically Sortable Identifiers. These work like UUIDs but are smaller and faster to generate, and can be canonically encoded as a URL-safe 26 character string (UUIDs are 36 characters). Again, this builds on a Rust crate—ulid-rs—and can generate 1 million byte-represented ULIDs with the ulid_bytes() function in just 88.4ms. # 28th January 2023, 3:45 am

sqlite-fastrand. Alex Garcia just dropped three new SQLite extensions, and I’m going to link to all of them. The first is sqlite-fastrand, which adds new functions for generating random numbers (and alphanumeric characters too). Impressively, these out-perform the default SQLite random() and randomblob() functions by about 1.6-2.6x, thanks to being built on the Rust fastrand crate which builds on wyhash, an extremely fast (though not cryptographically secure) hashing function. # 28th January 2023, 3:41 am

[On SQLite for production concurrent writes] In general, WAL mode “just works” as Simon said. You just need to make sure you don’t have long running write transactions, although those are somewhat problematic in any database system. Don’t do stuff like starting a write txn and then calling a remote API and then committing. That’ll kill your write throughout.

Ben Johnson # 26th January 2023, 7:36 pm

Wildebeest (via) New project from Cloudflare, first quietly unveiled three weeks ago: “Wildebeest is an ActivityPub and Mastodon-compatible server”. It’s built using a flurry of Cloudflare-specific technology, including Workers, Pages and their SQLite-based D1 database. # 23rd January 2023, 12:03 am

Hctree Design Documentation. More detailed information on the design of the new Hctree SQLite branch. # 20th January 2023, 12:50 am

Hctree: an experimental high-concurrency database backend for SQLite (via) Really interesting new research branch from the core SQLite team. “Hctree uses optimistic row-level locking and is designed to support dozens of concurrent writers running at full-speed”—with very impressive benchmarks supporting that claim. Also two bonuses: it has a replication mechanism based on the existing SQLite sessions extension, and it bumps up the maximum size of a SQLite database from 16TiB to 1EiB (roughly one million TiB). # 20th January 2023, 12:47 am

Introducing sqlite-xsv: The Fastest CSV Parser for SQLite. Alex Garcia continues to push the boundaries of SQLite extensions. This new extension in Rust wraps the lightning fast Rust csv crate and provides a new csv_reader() virtual table that can handle regular, gzipped and zstd compressed files. # 14th January 2023, 9:54 pm

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text or documentation and have it use that as part of its input for answering questions.

[... 3489 words]

2022

Introducing sqlite-loadable-rs: A framework for building SQLite Extensions in Rust. Alex Garcia has built a new Rust library for creating SQLite extensions—initially supporting custom scalar functions, virtual tables and table functions and with more types of extension coming soon. This looks very easy to use, partly because the documentation and examples are already delightfully thorough, especially for an initial release. # 7th December 2022, 11:08 pm

fasiha/yamanote (via) Yamanote is “a guerrilla bookmarking server” by Ahmed Fasih—it works using a bookmarklet that grabs a full serialized copy of the page—the innerHTML of both the head and body element—and passes it to the server, which stores it in a SQLite database. The files are then served with a Content-Security-Policy’: `default-src ’self’ header to prevent stored pages from fetching ANY external assets when they are viewed. # 16th November 2022, 3:48 am

JSON Changelog with SQLite (via) One of my favourite database challenges is how to track changes to rows over time. This is a neat recipe from 2018 which uses SQLite triggers and the SQLite JSON functions to serialize older versions of the rows and store them in TEXT columns. # 16th November 2022, 3:41 am

About the sqlite3 WASM/JS Subproject. SQLite now maintains an official WebAssembly build. It’s influenced by sql.js but is a fresh implementation with its own API design. It also supports Origin-Private FileSystem (OPFS)—a very new standard which doesn’t yet have wide browser support that allows websites to save and load files using a dedicated folder on the host machine. # 28th October 2022, 11:05 pm

Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn

I spent most of this week at DjangoCon in San Diego—my first outside-of-the-Bay-Area conference since the before-times.

[... 1183 words]

Measuring traffic during the Half Moon Bay Pumpkin Festival

This weekend was the 50th annual Half Moon Bay Pumpkin Festival.

[... 2693 words]

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare. Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream. # 7th October 2022, 5:47 pm

libsql (via) A brand new Apache 2 licensed fork of SQLite. The README explains the rationale behind the project: SQLite itself is open source but not open contribution, and this fork aims to try out new ideas. The most interesting to me so far is a plan to support user defined functions implemented in WebAssembly. The project also intends to use Rust for new feature development. # 4th October 2022, 4:13 pm

Introducing LiteFS (via) LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months. # 21st September 2022, 6:56 pm

How I’m a Productive Programmer With a Memory of a Fruit Fly (via) Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications. # 19th September 2022, 4:19 pm

[SQLite is] a database that in full-stack culture has been relegated to “unit test database mock” for about 15 years that is (1) surprisingly capable as a SQL engine, (2) the simplest SQL database to get your head around and manage, and (3) can embed directly in literally every application stack, which is especially interesting in latency-sensitive and globally-distributed applications. Reason (3) is clearly our ulterior motive here, so we’re not disinterested: our model user deploys a full-stack app (Rails, Elixir, Express, whatever) in a bunch of regions around the world, hoping for sub-100ms responses for users in most places around the world. Even within a single data center, repeated queries to SQL servers can blow that budget. Running an in-process SQL server neatly addresses it.

Thomas Ptacek # 16th September 2022, 1:49 am

APSW is now available on PyPI. News I missed from June: the venerable (17+ years old) APSW SQLite library for Python is now officially available on PyPI as a set of wheels, built using cibuildwheel. This is a really big deal: APSW is an extremely well maintained library which exposes way more low-level SQLite functionality than the standard library’s sqlite3 module, and to-date one of the only disadvantages of using it was the need to install it independently of PyPI. Now you can just run “pip install apsw”. # 15th September 2022, 10:18 pm

How the SQLite Virtual Machine Works. The latest entry in Ben Johnson’s series about SQLite internals. # 7th September 2022, 8:49 pm

CROSS JOIN and virtual tables in SQLite. Learned today on the SQLite forums that the SQLite CROSS JOIN in SQLite is a special case of join where the provided table order is preserved when executing the join. This is useful for advanced cases where you might want to use a SQLite virtual table to perform some kind of custom operation—searching against an external search engine for example—and then join the results back against other tables in a predictable way. # 7th September 2022, 12:15 am

Discord History Tracker. Very interestingly shaped piece of software. You install and run a localhost web application on your own machine, then paste some JavaScript into the Discord Electron app’s DevTools console (ignoring the prominent messages there warning you not to paste anything into it). The JavaScript scrapes messages you can see in Discord and submits them back to that localhost application, which writes them to a SQLite database for you. It’s written in C# with ASP.NET Core, but complied executables are provided for Windows, macOS and Linux. I had to allow execution of four different unsigned binaries to get this working on my Mac. # 2nd September 2022, 9:37 pm

Notes on the SQLite DuckDB paper

SQLite: Past, Present, and Future is a newly published paper authored by Kevin P. Gaffney, Martin Prammer and Jignesh M. Patel from the University of Wisconsin-Madison and D. Richard Hipp, Larry Brasfield and Dan Kennedy from the core SQLite engineering team.

[... 1021 words]

How SQLite Scales Read Concurrency (via) Ben Johnson’s series on SQLite internals continues—this time with a detailed explanation of how the SQLite WAL (Write-Ahead Log) is implemented. # 24th August 2022, 4:16 pm

Turning SQLite into a distributed database (via) Heyang Zhou introduces mvSQLite, his brand new open source “SQLite-compatible distributed database” built in Rust on top of Apple’s FoundationDB. This is a very promising looking new entry into the distributed/replicated SQLite space: FoundationDB was designed to provide low-level primitives that tools like this could build on top of. # 21st August 2022, 5:40 pm