Simon Willison’s Weblog

67 items tagged “sqlite”

db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable. # 1st July 2019, 1:35 am

Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database. # 14th June 2019, 12:45 am

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change. # 25th May 2019, 1:20 am

huey. Charles Leifer’s “little task queue for Python”. Similar to Celery, but it’s designed to work with Redis, SQLite or in the parent process using background greenlets. Worth checking out for the really neat design. The project is new to me, but it’s been under active development since 2011 and has a very healthy looking rate of releases. # 25th February 2019, 7:49 pm

sqlite-utils: a Python library and CLI tool for building SQLite databases

sqlite-utils is a combination Python library and command-line tool I’ve been building over the past six months which aims to make creating new SQLite databases as quick and easy as possible.

[... 1237 words]

socrata2sql (via) Phenomenal new open source tool released by Andrew Chavez at the Dallas Morning News. Socrata is the open data portal software used by huge numbers of local governments worldwide. socrata2sql is a tool that interacts with the standard Socrata API and can use it to suck down a dataset and save it as a SQLite, PostgreSQL, MySQL or other SQLAlchemy-supported database. I just tried this and it took a single command to create a SQLite database of every police arrest in Dallas in the past five years. # 8th February 2019, 3:27 pm

db-to-sqlite (via) I just released version 0.2 of a tiny CLI utility I’ve been working on. It builds on top of SQLAlchemy and lets you connect to any SQLAlchemy-supported database and convert the data from it to a local SQLite database file. The new --all option will mirror all available tables (including foreign key relationships), or you can use --sql to save the results of custom SQL queries. # 8th February 2019, 6:08 am

The Datasette Ecosystem. I’ve written a page of documentation that introduces the wider Datasette Ecosystem: csvs-to-sqlite, sqlite-utils, db-to-sqlite, dbf-to-sqlite, markdown-to-sqlite and a full collection of Datasette plugins. # 1st February 2019, 4:41 am

SQLite in 2018: A state of the art SQL dialect (via) In 2018 SQLite gained boolean literals, window functions, filter clauses, upserts and the ability to rename a column. If you want to try it out the latest official datasetteproject/datasette Docker image now bundles SQLite 3.26. # 15th January 2019, 4:21 pm

Usable Data (via) A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don’t know how I missed this one—it predates Datasette, but it perfectly captures the benefits that I’m trying to expose with the project. “In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.” # 11th January 2019, 6:33 pm

Exploring search relevance algorithms with SQLite

SQLite isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the FTS4 and FTS5 extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.

[... 1398 words]

Launching LiteCLI (via) Really neat alternative command-line client for SQLite, written in Python and using the same underlying framework as the similar pgcli (PostgreSQL) and mycli (MySQL) tools. Provides really intuitive autocomplete against table names, columns and other bits and pieces of SQLite syntax. Installation is as easy as “pip install litecli”. # 5th January 2019, 11:16 pm

Fast Autocomplete Search for Your Website (via) I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself—I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API. # 19th December 2018, 12:26 am

The interesting ideas in Datasette

Datasette (previously) is my open source tool for exploring and publishing structured data. There are a lot of ideas embedded in Datasette. I realized that I haven’t put many of them into writing.

[... 2857 words]

Slides, notes and links from my Datasette talk at PyBay (via) I presented a session about Datasette at the PyBay conference in San Francisco this morning. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library. # 19th August 2018, 11:23 pm

Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library. # 15th August 2018, 3:51 pm

coleifer/pysqlite3. Now that the pysqlite package is bundled as part of the Python standard library the original open source project is no longer actively maintained, and has not been upgraded for Python 3. Charles Leifer has been working on pysqlite3, a stand-alone package of the module. Crucially, this should enable compiling the latest version of SQLite (via the amalgamation package) without needing to upgrade the version that ships with the operating system. # 15th August 2018, 3:15 pm

Window Functions in SQLite 3.25.0. The next release of SQLite (apparently due for release in September) will add window functions, as specified in various SQL standards and already available in PostgreSQL. This is going to dramatically improve SQLite as an engine for performing analytical queries, especially across time series data. It’s also going to further emphasize the need for people to be able to upgrade their SQLite versions beyond those provided by the operating system—the default Ubuntu run by Travis CI still only ships with SQLite 3.8 for example. # 15th August 2018, 3:12 pm

Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory. # 2nd August 2018, 9:03 pm

ActorDB. Distributed SQL database written in Erlang built on top of SQLite (on top of LMDB), adding replication using the raft consensus algorithm (so sharded with no single-points of failure) and a MySQL protocol interface. Interesting combination of technologies. # 24th June 2018, 9:48 pm

Query Parquet files in SQLite. Colin Dellow built a SQLite virtual table extension that lets you query Parquet files directly using SQL. Parquet is interesting because it’s a columnar format that dramatically reduces the space needed to store tables with lots of duplicate column data—most CSV files, for example. Colin reports being able to shrink a 1291 MB CSV file from the Canadian census to an equivalent Parquet file weighing just 42MB (3% of the original)—then running a complex query against the data in just 60ms. I’d love to see someone get this extension working with Datasette. # 24th June 2018, 7:44 pm

SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it. # 30th May 2018, 4:34 am

Library of Congress Sustainability of Digital Formats: SQLite. “The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets.” # 28th May 2018, 5:19 pm

VirtualKNN for SpatiaLite. This looks amazing: a special virtual table shipped as part of SpatiaLite 4.4.0 which implements a fast, R-Tree backed mechanism for finding the X nearest points against a geospatial database table. There’s just one catch: it’s only available in 4.4.0, but the most recent “stable” release of SpatiaLite is 4.3.0a from September 2015 so the version you get if you install from apt-get or homebrew doesn’t yet have this functionality. I’d love to figure out a neat way to package and distribute this along with Datasette. I’d also like to figure out a clean way to ship a more recent version of SQLite than the one that is currently packaged with Python 3 (3.16.2, where the latest SQLite release is 3.23.1). # 21st May 2018, 9:23 pm

sqlitebiter. SImilar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there. # 17th May 2018, 10:40 pm

sql.js Online SQL interpreter (via) This is fascinating: sql.js is a project that complies the whole of SQLite to JavaScript using Emscripten. The demo is an online SQL interpreter which lets you import an existing SQLite database from your filesystem and run queries against it directly in your browser. # 17th May 2018, 9:28 pm

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter. # 12th May 2018, 12:09 pm

The latest SQLite 3.8.7 alpha version is 50% faster than the 3.7.17 release from 16 months ago.  That is to say, it does 50% more work using the same number of CPU cycles. [...] The 50% faster number above is not about better query plans.  This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees.  We have achieved this by incorporating hundreds of micro-optimizations.  Each micro-optimization might improve the performance by as little as 0.05%.  If we get one that improves performance by 0.25%, that is considered a huge win.  Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up.

D. Richard Hipp # 10th May 2018, 5:15 am

Notes from my appearance on the Changelog podcast

After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!

[... 536 words]

Exploring the UK Register of Members Interests with SQL and Datasette

Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.

[... 1167 words]