Simon Willison’s Weblog

140 items tagged “sqlite”

2021

SQLite: STRICT Tables (draft). Draft documentation for a feature that sounds like it could be arriving in SQLite 3.37 (the next release)—adding a “STRICT” table-option keyword to a CREATE TABLE statement will cause the table to strictly enforce typing rules for data in that table, rejecting inserts that fail to match the column’s datatypes. I’ve seen many programmers dismiss SQLite due to its loose typing, so this feature is really exciting to me: it will hopefully remove a common objection to embracing SQLite for projects. # 21st August 2021, 7:05 pm

Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.

[... 2163 words]

Bare columns in an aggregate queries. This is a really nice SQL tweak implemented in SQLite: If you run a query like “SELECT a, b, max(c) FROM tab1 GROUP BY a” SQLite will find the row with the highest value for c and use the columns of that row as the returned values for the other columns mentioned in the query. # 10th August 2021, 1:29 am

Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool

Earlier this week I released sqlite-utils 3.14 with a powerful new command-line tool: sqlite-utils convert, which applies a conversion function to data stored in a SQLite column.

[... 1902 words]

Weeknotes: datasette-remote-metadata, sqlite-transform --multi

I mentioned Project Pelican (still a codename until the end of the embargo) last week. This week it inspired a new plugin, datasette-remote-metadata.

[... 595 words]

The Baked Data architectural pattern

I’ve been exploring an architectural pattern for publishing websites over the past few years that I call the “Baked Data” pattern. It provides many of the advantages of static site generators while avoiding most of their limitations. I think it deserves to be used more widely.

[... 1890 words]

Datasette—an ecosystem of tools for working with small data

This is the transcript and video from a talk I gave at PyGotham 2020 about using SQLite, Datasette and Dogsheep to work with small data.

[... 4654 words]

Weeknotes: sqlite-transform 1.1, Datasette 0.58.1, datasette-graphql 1.5

Work on Project Pelican inspires new features and improvements across a number of different projects.

[... 1419 words]

Inserting One Billion Rows in SQLite Under a Minute (via) Avinash Sajjanshetty experiments with accelerating writes to a test table in SQLite, using various SQLite pragmas to accelerate inserts followed by a rewrite of Python code to Rust. Also of note: running the exact same code in PyPy saw a 3.5x speed-up! # 19th July 2021, 12:13 am

The Untold Story of SQLite With Richard Hipp. This is a really interesting interview with SQLite creator D. Richard Hipp—it covers all sorts of aspects of the SQLite story I hadn’t heard before, from its inspiration by a software challenge on a battleship to the first income from clients such as AOL and Symbian to the formation of the SQLite Consortium (based on advice from Mozilla’s Mitchell Baker) and more. # 16th July 2021, 8:12 pm

Datasette 0.58: The annotated release notes

I released Datasette 0.58 last night, with new plugin hooks, Unix domain socket support, a major faceting performance fix and a few other improvements. Here are the annotated release notes.

[... 1062 words]

Joining CSV and JSON data with an in-memory SQLite database

The new sqlite-utils memory command can import CSV and JSON data directly into an in-memory SQLite database, combine and query it using SQL and output the results as CSV, JSON or various other formats of plain text tables.

[... 1494 words]

geocode-sqlite. Neat command-line Python utility by Chris Amico: point it at a SQLite database file and it will add latitude and longitude columns and populate them by geocoding one or more of the other fields, using your choice from four currently supported geocoders. # 17th May 2021, 1:15 am

Hosting SQLite databases on Github Pages (via) I’ve seen the trick of running SQLite compiled to WASM in the browser before, but this comes with an incredibly clever bonus trick: it uses SQLite’s page structure to fetch subsets of the database file via HTTP range requests, which means you can run indexed SQL queries against a 600MB database file while only fetching a few MBs of data over the wire. Absolutely brilliant. Tucked away at the end of the post is another neat trick: making the browser DOM available to SQLite as a virtual table, so you can query and update the DOM of the current page using SQL! # 2nd May 2021, 6:55 pm

Spatialite Speed Test. Part of an excellent series of posts about SpatiaLite from 2012—here John C. Zastrow reports on running polygon intersection queries against a 1.9GB database file in 40 seconds without an index and 0.186 seconds using the SpatialIndex virtual table mechanism. # 4th April 2021, 4:28 pm

sqlite-plus (via) Anton Zhiyanov bundled together a bunch of useful SQLite C extensions for things like statistical functions, unicode string normalization and handling CSV files as virtual tables. The GitHub Actions workflow here is a particularly useful example of compiling SQLite extensions for three different platforms. # 25th March 2021, 9:13 pm

logpaste (via) Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket. # 17th March 2021, 3:48 pm

sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions. # 15th March 2021, 2:55 am

sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection. # 15th March 2021, 2:52 am

Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin. # 22nd February 2021, 9:16 pm

Cross-database queries in SQLite (and weeknotes)

I released Datasette 0.55 and sqlite-utils 3.6 this week with a common theme across both releases: supporting cross-database joins.

[... 719 words]

Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice. # 11th February 2021, 7:25 pm

Serving map tiles from SQLite with MBTiles and datasette-tiles

Working on datasette-leaflet last week re-kindled my interest in using Datasette as a GIS (Geographic Information System) platform. SQLite already has strong GIS functionality in the form of SpatiaLite and datasette-cluster-map is currently the most downloaded plugin. Most importantly, maps are fun!

[... 1334 words]

Drawing shapes on a map to query a SpatiaLite database (and other weeknotes)

This week I built a Datasette plugin that lets you query a database by drawing shapes on a map!

[... 950 words]

sqlite-utils 3.2 (via) As discussed in my weeknotes yesterday, this is the release of sqlite-utils that adds the new “cached table counts via triggers” mechanism. # 3rd January 2021, 9:25 pm

Weeknotes: A flurry of not-quite-finished features

My Christmas present to myself this year was to allow myself to spend a week working on stuff I found interesting, rather than sticking to the most important things. This may have been a mistake: it’s left me with a flurry of interesting but not-quite-finished features.

[... 2249 words]

2020

Replicating SQLite with rqlite (via) I’ve been trying out rqlite, a “lightweight, distributed relational database, which uses SQLite as its storage engine”. It’s written in Go and uses the Raft consensus algorithm to allow a cluster of nodes to elect a leader and replicate SQLite statements between them. By default it uses in-memory SQLite databases with an on-disk Raft replication log—here are my notes on running it in “on disk” mode as a way to run multiple Datasette processes against replicated SQLite database files. # 28th December 2020, 7:51 pm

Building a search engine for datasette.io

This week I added a search engine to datasette.io, using the search indexing tool I’ve been building for Dogsheep.

[... 1391 words]

I get asked a lot about learning to code. Sure, if you can. It’s fun. But the real action, the crux of things, is there in the database. Grab a tiny, free database like SQLite. Import a few million rows of data. Make them searchable. It’s one of the most soothing activities known to humankind, taking big piles of messy data and massaging them into the rigid structure required of a relational database. It’s true power.

Paul Ford # 16th December 2020, 5:35 am

CG-SQL (via) This is the toolkit the Facebook Messenger team wrote to bring stored procedures to SQLite. It implements a custom version of the T-SQL language which it uses to generate C code that can then be compiled into a SQLite module. # 22nd October 2020, 6:25 pm