Simon Willison’s Weblog

Subscribe

Sunday, 24th June 2018

Query Parquet files in SQLite. Colin Dellow built a SQLite virtual table extension that lets you query Parquet files directly using SQL. Parquet is interesting because it’s a columnar format that dramatically reduces the space needed to store tables with lots of duplicate column data—most CSV files, for example. Colin reports being able to shrink a 1291 MB CSV file from the Canadian census to an equivalent Parquet file weighing just 42MB (3% of the original)—then running a complex query against the data in just 60ms. I’d love to see someone get this extension working with Datasette.

# 7:44 pm / big-data, datasette, parquet, sqlite, colin-dellow

source-map-explorer. Very neat tool for creating a tree map visualization of the size of the components of a bundled JavaScript file created by webpack (or if you’re using create-react-app by “npm run build”). I ran this using “npx source-map-explorer build/static/js/main.d63f3f34.js” (since I don’t like using “npm install -g”).

# 9:37 pm / npm, javascript

ActorDB. Distributed SQL database written in Erlang built on top of SQLite (on top of LMDB), adding replication using the raft consensus algorithm (so sharded with no single-points of failure) and a MySQL protocol interface. Interesting combination of technologies.

# 9:48 pm / erlang, big-data, scaling, sqlite

2018 » June

MTWTFSS
    123
45678910
11121314151617
18192021222324
252627282930