Simon Willison’s Weblog

Subscribe

Items in May, 2018

Filters: Year: 2018 × Month: May × Sorted by date


SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it. # 30th May 2018, 4:34 am

Library of Congress Sustainability of Digital Formats: SQLite. “The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets.” # 28th May 2018, 5:19 pm

In one case this winter, miners from China landed their private jet at the local airport, drove a rental car to the visitor center at the Rocky Reach Dam, just north of Wenatchee, and, according to Chelan County PUD officials, politely asked to see the “dam master because we want to buy some electricity.”

Paul Roberts, Seattle Times # 27th May 2018, 4:16 pm

A traditional centralized database only needs to be written to once. A blockchain needs to be written to thousands of times. A traditional centralized database needs to only checks the data once. A blockchain needs to check the data thousands of times. A traditional centralized database needs to transmit the data for storage only once. A blockchain needs to transmit the data thousands of times. The costs of maintaining a blockchain are orders of magnitude higher and the cost needs to be justified by utility. Most applications looking for some of the properties stated earlier like consistency and reliability can get such things for a whole lot cheaper utilizing integrity checks, receipts and backups.

Jimmy Song # 24th May 2018, 2:44 pm

The test for extracting common code should not be “Are they the same right now?” but “Do they have the same reasons to change?”

Chris Ford # 24th May 2018, 2:33 pm

Beginner’s Guide to Jupyter Notebooks for Data Science (with Tips, Tricks!) (via) If you haven’t yet got on the Jupyter notebooks bandwagon this should help. It’s the single biggest productivity improvement I’ve made to my workflow in a very long time. # 24th May 2018, 1:58 pm

Showdown: MySQL 8 vs PostgreSQL 10 (via) MySQL 8 makes comparisons between PostgreSQL and MySQL far more interesting, as it closes some of the key feature gaps. Meanwhile the PostgreSQL replication story (long one of MySQL’s key advantages) has improved dramatically in recent versions. This article offers a useful overview of the current differences, including diving into some of the less obvious implementation details that differ between the two. # 23rd May 2018, 5:02 pm

Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage. # 22nd May 2018, 10:12 pm

Google is not trying to break the web by pushing for more HTTPS. Neither is Mozilla and neither are any of the other orgs saying “Hey, it would be good if traffic wasn’t eavesdropped on or modified”. This is fixing a deficiency in the web as it has stood for years.

Troy Hunt # 22nd May 2018, 4:17 pm

Observable: Downloading and Embedding Notebooks (via) Big news from the Observable team: firstly, they’ve released the open source runtime for their notebooks which means you can now execute the code from a notebook independently of their hosted service. On top of that they’ve constructed an elegant way of exporting and executing notebooks (or specific notebook cells) as ES6 modules and as installable npm package tarballs. # 22nd May 2018, 12:14 pm

New in Django 2.0: Database instrumentation. I missed this previously. Django 2.0 shipped with one of my most-wanted features: the ability to easily instrument database calls (for logging and metrics) without having to monkey-patch or run an entirely new database backend. Can’t wait to try this out. # 21st May 2018, 9:28 pm

VirtualKNN for SpatiaLite. This looks amazing: a special virtual table shipped as part of SpatiaLite 4.4.0 which implements a fast, R-Tree backed mechanism for finding the X nearest points against a geospatial database table. There’s just one catch: it’s only available in 4.4.0, but the most recent “stable” release of SpatiaLite is 4.3.0a from September 2015 so the version you get if you install from apt-get or homebrew doesn’t yet have this functionality. I’d love to figure out a neat way to package and distribute this along with Datasette. I’d also like to figure out a clean way to ship a more recent version of SQLite than the one that is currently packaged with Python 3 (3.16.2, where the latest SQLite release is 3.23.1). # 21st May 2018, 9:23 pm

The big thing I always get asked to find are dank dilapidated alleys, and New York City has, like, 5 alleys that look like that. Maybe four. You can’t film in three of them. So what it comes down to is there’s one alley left in New York, Cortlandt Alley, that everybody films in because it’s the last place. I try to stress to these directors in a polite way that New York is not a city of alleys. Boston is a city of alleys. Philadelphia has alleys. I don’t know anyone who uses the ‘old alleyway shortcut’ to go home. It doesn’t exist here. But that’s the movie you see.

Nick Carr # 21st May 2018, 12:04 am

Datasette Facets

Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse.

[... 1189 words]

sqlitebiter. Similar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there. # 17th May 2018, 10:40 pm

sql.js Online SQL interpreter (via) This is fascinating: sql.js is a project that complies the whole of SQLite to JavaScript using Emscripten. The demo is an online SQL interpreter which lets you import an existing SQLite database from your filesystem and run queries against it directly in your browser. # 17th May 2018, 9:28 pm

Django #8936: Add view (read-only) permission to admin (closed). Opened 10 years ago. Closed 15 hours ago. I apparently filed this issue during the first DjangoCon back in September 2008, when Adrian and Jacob mentioned on-stage that they would like to see a read-only permission for the Django Admin. Thanks to Olivier Dalang from Fiji and Petr Dlouhý from Prague it’s going to be a feature shipping in Django 2.1. Open source is a beautiful thing. # 17th May 2018, 1:40 pm

How to number rows in MySQL. MySQL’s user variables can be used to add a “rank” or “row_number” column to a database query that shows the ranking of a row against a specific unique value. This means you can return the first N rows for any given column—for example, given a list of articles return just the first three tags for each article. I’ve recently found myself using this trick for a few different things—once you know it, chances to use it crop up surprisingly often. # 16th May 2018, 9:06 pm

isomorphic-git (via) A pure-JavaScript implementation of the git protocol and underlying tools which works both server-side (Node.js) AND in the client, using an emulation of the fs API. Given the right CORS headers it can clone a GitHub repository over HTTPS right into your browser. Impressive. # 16th May 2018, 8:54 pm

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter. # 12th May 2018, 12:09 pm

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml. # 11th May 2018, 5:47 pm

The latest SQLite 3.8.7 alpha version is 50% faster than the 3.7.17 release from 16 months ago.  That is to say, it does 50% more work using the same number of CPU cycles. [...] The 50% faster number above is not about better query plans.  This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees.  We have achieved this by incorporating hundreds of micro-optimizations.  Each micro-optimization might improve the performance by as little as 0.05%.  If we get one that improves performance by 0.25%, that is considered a huge win.  Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up.

D. Richard Hipp # 10th May 2018, 5:15 am

The synthetic voice of synthetic intelligence should sound synthetic. Successful spoofing of any kind destroys trust. When trust is gone, what remains becomes vicious fast.

Stewart Brand # 10th May 2018, 4:56 am

Datasette: The Metropolitan Museum of Art (via) The Metropolitan Museum of Art publish a CSV file on GitHub with details of 464,360 items from their collection. I turned it into a searchable Datasette instance. # 9th May 2018, 6:38 pm

Notes from my appearance on the Changelog podcast

After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!

[... 536 words]

mendoza-trees-workshop (via) Eventbrite Argentina has an academy program to train new Python/Django developers. I presented a workshop there this morning showing how Django and Jupyter can be used together to iterate on a project. Since the session was primarily about demonstrating Jupyter it was mostly live-coding, but the joy of Jupyter is that at the end of a workshop you can go back and add inline commentary to the notebooks that you used. In putting together the workshop I learned about the django_extensions “/manage.py shell_plus --notebook” command—it’s brilliant! It launches Jupyter in a way that lets you directly import your Django models without having to mess around with DJANGO_SETTINGS_MODULE. # 8th May 2018, 5:22 pm

Somebody should write up how the early-2000s push for open standards and the Web Standards Project’s advocacy are a major factor in why Apple was able to create its enormously valuable comeback. Put another way, one of the killer moments of the first iPhone demo was Jobs saying it had the “real” web, not the “baby” web, by demonstrating the NYT homepage. That would’ve been IE-only & Windows-only if not for effective advocacy from the web standards community.

Anil Dash # 7th May 2018, 1:28 pm

Datasette 0.21: New _shape=, new _size=, search within columns. Nothing earth-shattering here but it’s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column. # 5th May 2018, 11:25 pm

Iodide Notebook: Project Examples (via) Iodide is a very promising looking open source JavaScript notebook project, and these examples do a great job of showing what it can do. It’s not as slick (yet) as Observable but it does run completely independently using just a browser. # 3rd May 2018, 6:42 pm