Simon Willison’s Weblog

100 items tagged “sqlite”

2020

DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern. # 19th September 2020, 11:43 pm

Weeknotes: datasette-dump, sqlite-backup, talks

I spent some time this week digging into Python’s sqlite3 internals. I also gave two talks and recorded a third, due to air at PyGotham in October.

[... 928 words]

Weeknotes: Installing Datasette with Homebrew, more GraphQL, WAL in SQLite

This week I’ve been working on making Datasette easier to install, plus wide-ranging improvements to the Datasette GraphQL plugin.

[... 1009 words]

GraphQL in Datasette with the new datasette-graphql plugin

This week I’ve mostly been building datasette-graphql, a plugin that adds GraphQL query support to Datasette.

[... 1249 words]

sqlite-utils 2.14 (via) I finally figured out porter stemming with SQLite full-text search today—it turns out it’s as easy as adding tokenize=’porter’ to the CREATE VIRTUAL TABLE statement. So I just shipped sqlite-utils 2.14 with a tokenize= option (plus the ability to insert binary file data from stdin). # 1st August 2020, 9:19 pm

Fun with binary data and SQLite

This week I’ve been mainly experimenting with binary data storage in SQLite. sqlite-utils can now insert data from binary files, and datasette-media can serve content over HTTP that originated as binary BLOBs in a database file.

[... 947 words]

sqlite-utils 2.12 (via) I’ve been experimenting with ways of improving BLOB support in Datasette and sqlite-utils. This new version of sqlite-utils includes a “sqlite-utils insert-files” command, which can recursively crawl directories for files and add their contents to SQLite with configurable columns containing their metadata. I was inspired by Paul Ford who has been creating multi-GB SQLite databases of images and PDFs. It turns out that when disk space is cheap this is a pretty effective way of working with interesting corpuses of documents and images. # 27th July 2020, 7:36 am

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

[... 703 words]

sqlite-generate (via) I wrote this tool today to generate arbitrarily large SQLite databases, for testing purposes. You tell it how many tables, columns and rows you want and it will use the Faker Python library to generate random data and populate the tables with it. # 23rd June 2020, 2:19 am

Using SQL to Look Through All of Your iMessage Text Messages (via) Dan Kelch shows how to access the iMessage SQLite database at ~/Library/Messages/chat.db—it’s protected under macOS Catalina so you have to enable Full Disk Access in the privacy settings first. I usually use the macOS terminal app but I installed iTerm for this because I’d rather enable full disk access to a separate terminal program than let anything I’m running in my regular terminal take advantage of it. It worked! Now I can run “datasette ~/Library/Messages/chat.db” to browse my messages. # 22nd May 2020, 4:45 pm

Using SQL to find my best photo of a pelican according to Apple Photos

According to the Apple Photos internal SQLite database, this is the most aesthetically pleasing photograph I have ever taken of a pelican:

[... 1937 words]

Weeknotes: Datasette 0.41, photos breakthroughs

Shorter weeknotes this week, because my main project for the week warrants a detailed write-up on its own (coming soon... update 21st May here it is).

[... 867 words]

Weeknotes: Datasette 0.39 and many other projects

This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.

[... 806 words]

hacker-news-to-sqlite (via) The latest in my Dogsheep series of tools: hacker-news-to-sqlite uses the Hacker News API to fetch your comments and submissions from Hacker News and save them to a SQLite database. # 21st March 2020, 4:27 am

Weeknotes: Datasette Writes

As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.

[... 604 words]

Things I learned about shapefiles building shapefile-to-sqlite

The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.

[... 1073 words]

geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON. # 31st January 2020, 6:40 am

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns. # 24th January 2020, 4:20 am

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine. # 10th January 2020, 5:15 am

2019

sqlite-utils 2.0: real upserts

I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.

[... 1140 words]

athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources. # 18th December 2019, 9:05 am

Logging to SQLite using ASGI middleware

I had some fun playing around with ASGI middleware and logging during our flight back to England for the holidays.

[... 2535 words]

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column. # 4th November 2019, 2:41 am

goodreads-to-sqlite (via) This is so cool! Tobias Kunze built a Python CLI tool to import your Goodreads data into a SQLite database, inspired by github-to-sqlite and my various other Dogsheep tools. It’s the first Dogsheep style tool I’ve seen that wasn’t built by me—and Tobias’ write-up includes some neat examples of queries you can run against your Goodreads data. I’ve now started using Goodreads and I’m importing my books into my own private Dogsheep Datasette instance. # 14th October 2019, 4:07 am

Tracking PG&E outages by scraping to a git repo

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

[... 833 words]

SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the  crime scene report, but you somehow lost it. You vaguely remember that the crime  was a murder that occurred sometime on ​Jan.15, 2018 and that it took place in SQL  City. Start by retrieving the corresponding crime scene report from the police  department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser. # 7th October 2019, 11:37 pm

Weeknotes: Dogsheep

Having figured out my Stanford schedule, this week I started getting back into the habit of writing some code.

[... 1367 words]

Weeknotes: Design thinking for journalists, genome-to-sqlite, datasette-atom

I haven’t had much time for code this week: we’ve had a full five day workshop at JSK with Tran Ha (a JSK alumni) learning how to apply Design Thinking to our fellowship projects and generally to challenges facing journalism.

[... 869 words]

genome-to-sqlite. I just found out 23andMe let you export your genome as a zipped TSV file, so I wrote a little Python command-line tool to import it into a SQLite database. # 19th September 2019, 3:58 pm

Weeknotes: ONA19, twitter-to-sqlite, datasette-rure

I’ve decided to start writing weeknotes for the duration of my JSK fellowship. Here goes!

[... 919 words]