Items tagged databases in 2020
I get asked a lot about learning to code. Sure, if you can. It’s fun. But the real action, the crux of things, is there in the database. Grab a tiny, free database like SQLite. Import a few million rows of data. Make them searchable. It’s one of the most soothing activities known to humankind, taking big piles of messy data and massaging them into the rigid structure required of a relational database. It’s true power.
Yesterday I described the new sqlite-utils transform mechanism for applying SQLite table transformations that go beyond those supported by
ALTER TABLE. The other new feature in sqlite-utils 2.20 builds on that capability to allow you to refactor a database table by extracting columns into separate tables. I’ve called it sqlite-utils extract.
DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern. # 19th September 2020, 11:43 pm
This week I helped Natalie launch Rocky Beaches, shipped Datasette 0.48 and several releases of
datasette-graphql, upgraded the CSRF protection for
datasette-upload-csvs and figured out how to get a commit log of changes to my blog by backing up its database to a GitHub repository.
Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew). # 1st June 2020, 10:11 pm