Simon Willison’s Weblog

Subscribe

258 items tagged “sqlite”

2018

SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it.

# 30th May 2018, 4:34 am / sqlite, spatialite, datasette, mark-pilgrim, documentation

Library of Congress Sustainability of Digital Formats: SQLite. “The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets.”

# 28th May 2018, 5:19 pm / sqlite

VirtualKNN for SpatiaLite. This looks amazing: a special virtual table shipped as part of SpatiaLite 4.4.0 which implements a fast, R-Tree backed mechanism for finding the X nearest points against a geospatial database table. There’s just one catch: it’s only available in 4.4.0, but the most recent “stable” release of SpatiaLite is 4.3.0a from September 2015 so the version you get if you install from apt-get or homebrew doesn’t yet have this functionality. I’d love to figure out a neat way to package and distribute this along with Datasette. I’d also like to figure out a clean way to ship a more recent version of SQLite than the one that is currently packaged with Python 3 (3.16.2, where the latest SQLite release is 3.23.1).

# 21st May 2018, 9:23 pm / spatialite, gis, sqlite

sqlitebiter. Similar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there.

# 17th May 2018, 10:40 pm / datasette, scraping, csv, sqlite

sql.js Online SQL interpreter (via) This is fascinating: sql.js is a project that complies the whole of SQLite to JavaScript using Emscripten. The demo is an online SQL interpreter which lets you import an existing SQLite database from your filesystem and run queries against it directly in your browser.

# 17th May 2018, 9:28 pm / sqlite, javascript

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter.

# 12th May 2018, 12:09 pm / datasette, search, sqlite, fulltextsearch

The latest SQLite 3.8.7 alpha version is 50% faster than the 3.7.17 release from 16 months ago.  That is to say, it does 50% more work using the same number of CPU cycles. [...] The 50% faster number above is not about better query plans.  This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees.  We have achieved this by incorporating hundreds of micro-optimizations.  Each micro-optimization might improve the performance by as little as 0.05%.  If we get one that improves performance by 0.25%, that is considered a huge win.  Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up.

D. Richard Hipp

# 10th May 2018, 5:15 am / sqlite, performance, d-richard-hipp

Notes from my appearance on the Changelog podcast

After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!

[... 536 words]

Exploring the UK Register of Members Interests with SQL and Datasette

Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.

[... 1,167 words]

csvs-to-sqlite 0.8. I released a new version of my csvs-to-sqlite tool this morning with a bunch of handy new features. It can now rename columns and define their types, add the CSV filenames as an additional column, add create indexes on columns and parse dates and datetimes into SQLite-friendly ISO formatted values.

# 24th April 2018, 4:11 pm / projects, csv, sqlite

How I made a Who’s On First subset database. Inspired by Paul Ford on Twitter, I tried out a new trick with SQLite: connect to a database containing JSON, attach a brand new empty database file using “attach database”, then populate it using INSERT INTO ... SELECT plus the json_extract() function to extract out a subset of the JSON properties into a new table in the new database.

# 3rd February 2018, 5:25 am / whosonfirst, sqlite, json, paul-ford

[On SQLite] The JSON interface is like, "we save the text and when you retrieve it we parse the JSON at several hundred MB/s and let you do path queries against it please stop overthinking it, this is filing cabinet."

Paul Ford

# 29th January 2018, 4:29 pm / sqlite, json, paul-ford

SQLite: The Spellfix1 Virtual Table (via) A SQLite extension that lets you create a spellfix1 virtual table which can power “fuzzy” search, by suggesting corrections for misspelled words. I haven’t tried this yet but it looks pretty powerful, including a configurable edit distance and the ability to set up custom “soundslike” terms for words with known unusual spellings.

# 29th January 2018, 5:24 am / sqlite

Analyzing my Twitter followers with Datasette

I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using Datasette.

[... 1,314 words]

How to turn a list of JSON objects into a Datasette. ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette.

# 20th January 2018, 1:07 am / datasette, pandas, json, sqlite

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc.

# 10th January 2018, 9:01 pm / osx, sqlite, datasette, json, stackoverflow

2017

Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette

Given a latitude and longitude, how can we tell what time zone that point lies within? Here’s how I built a simple JSON API to answer that question, using a combination of data from OpenStreetMap, the SpatiaLite extension for SQLite and my Datasette API tool.

[... 2,185 words]

Big Data Workflow with Pandas and SQLite (via) Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite.

# 28th November 2017, 11:02 pm / pandas, csv, sqlite

Many Small Queries Are Efficient In SQLite. Since SQLite runs in-process rather than being accessed over a network it avoids the per-query overhead of network round trips. This means that while MySQL or PostgreSQL applications need to avoid N+1 query patterns that create 100s of queries per request, SQLite apps can be designed differently: provided you hit indexes or small tables, 200 queries just means 200 extra cheap function calls.

# 26th November 2017, 4:24 pm / performance, sqlite

SQLite Query Language: WITH clause. SQLite’s documentation on recursive CTEs starts out with some nice clear examples of tree traversal using a WITH statement, then gets into graphs, then goes way off the deep end with a Mandelbrot Set query and a query that can solve Soduku puzzles (“in less than 300 milliseconds on a modern workstation”).

# 26th November 2017, 7:23 am / sql, sqlite

New in Datasette: filters, foreign keys and search

I’ve released Datasette 0.13 with a number of exciting new features (Datasette previously).

[... 1,143 words]

harelba/q (via) q is a neat command-line utility that lets you run SQL queries directly against CSV and TSV files. Internally it works by firing up an in-memory SQLite database, and as of the latest release (1.7.1) you can use the new --save-db-to-disk option to save that in-memory database to disk.

# 25th November 2017, 5:49 pm / csv, sqlite

If you’re a public data provider—and many large NGOs, government organizations, cultural organizations, historical archives, media organizations, medical orgs, and academic institutions are exactly that—you can publish gigabytes of data, and make it available as an API, and make it easy to browse on the web, too, with extremely low effort. Put it into SQLite, point this little guy at it, and you’ve just radically increased the accessibility and utility of your data. Because messing around in SQL from a web browser is orders of magnitude more immediately useful than downloading a CSV, processing it, and figuring out what comes next.

Paul Ford

# 17th November 2017, 1:10 am / datasette, sqlite, paul-ford

Datasette: instantly create and publish an API for your SQLite databases

I just shipped the first public version of datasette, a new tool for creating and publishing JSON APIs for SQLite databases.

[... 968 words]

simonw/csvs-to-sqlite. I built a simple tool for bulk converting multiple CSV files into a SQLite database.

# 13th November 2017, 6:49 am / csv, sqlite, github, datasette, projects

Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA.

# 7th November 2017, 2:06 pm / sqlite, open-source, python, github

A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more.

# 2nd November 2017, 1:23 am / sqlite, pandas, jupyter, python

System Requirements For SQLite. Document describing the high level goals and objectives of SQLite. Like everything to do with SQLite this exhibits some incredibly well thought out software engineering. I particularly like “S80000: SQLite shall exhibit ductile failure characteristics“ where ductile is described in opposition to brittle: a ductile system begins showing signs of trouble well in advance of failure.

# 22nd October 2017, 9:53 pm / sqlite

Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block.

# 22nd October 2017, 12:35 pm / sqlite, python

2010

last.fm for television. Dale Lane’s neat hack to visualise his television watching habits. An Ubuntu / vdx home theatre stores TV events in SQLite, and graphs are generated using Python and Open Flash Chart 2. The really clever bit: the back-end captures nearby bluetooth IDs’ allowing events to be filtered by the people watching based on the presence of their mobile phones.

# 7th January 2010, 7:28 pm / dalelane, tv, ubuntu, vdx, sqlite, python, graphs, visualisation, lifetracking