Simon Willison’s Weblog

Subscribe

Items in Jan, 2018

Filters: Year: 2018 × Month: Jan × Sorted by date


USGS World Earthquake Map (observable notebook). Here’s an extended version of the notebook constructed by Jeremy Ashkenas in that Observable YouTube demo. You really need to check this thing out—the notebook itself has sliders in that you can manipulate (even on a mobile browser) or you can click to edit the code and see your changes reflected in real-time. If you sign in with GitHub you can fork the project to your own account and save your changes. # 31st January 2018, 7:07 pm

Observable: An Earthquake Globe in Ten Minutes. Well worth your time. Jeremy Ashkenas uses Observable to live-code an interactive visualization of recent earthquakes around the world, using USGS data (fetched as JSON), d3, topoJSON and an Observable notebook. I’m sold—this is truly ground-breaking new technology. # 31st January 2018, 5:01 pm

Observable Beta (via) Observable just released their beta, and it’s quite something. It’s by Mike Bostock (d3), Jeremy Ashkenas (Backbone, CoffeeScript) and Tom MacWright (Mapbox Studio). The easiest way to describe it is Jupyter notebooks for JavaScript supporting reactive programming—so code is evaluated as you type and you can add interactive widgets (like sliders and canvas views) to construct explorable visualizations on the fly. # 31st January 2018, 4:46 pm

[On SQLite] The JSON interface is like, “we save the text and when you retrieve it we parse the JSON at several hundred MB/s and let you do path queries against it please stop overthinking it, this is filing cabinet.”

Paul Ford # 29th January 2018, 4:29 pm

Datasette Demo (video) from the SF Python Meetup

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

[... 63 words]

SQLite: The Spellfix1 Virtual Table (via) A SQLite extension that lets you create a spellfix1 virtual table which can power “fuzzy” search, by suggesting corrections for misspelled words. I haven’t tried this yet but it looks pretty powerful, including a configurable edit distance and the ability to set up custom “soundslike” terms for words with known unusual spellings. # 29th January 2018, 5:24 am

6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations? Citizen science app iNaturalist is seeing explosive growth at the moment—they’ve been around for nearly a decade but 1/6 of the observations posted to the site were added in just the past few months. Having tried the latest version of their iPhone app it’s easy to see why: snap a photo of some nature and upload it to the app and it will use surprisingly effective machine learning to suggest the genus or even the individual species. Submit the observation and within a few minutes other iNaturalist community members will confirm the identification or suggest a correction. It’s brilliantly well executed and an utter delight to use. # 28th January 2018, 8:18 pm

If I tweeted a throwaway comment in appreciation for McDonald’s apple pies and some other randos on Twitter happened to also tweet similar thoughts over the last few months, it doesn’t mean by extrapolation that ‘Millennials Can’t Get Enough Of McDonald’s Apple Pies’. 
The Twitter search box is not a polling agency and Twitter doesn’t include everybody’s thoughts on everything. Just some people’s thoughts on some things.

Nick Walker # 28th January 2018, 4:18 pm

Analyzing my Twitter followers with Datasette

I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using Datasette.

[... 1314 words]

How did the Roman Republic determine its budget? Fascinating answer on the AskHistorians subreddit about how taxation worked in the Roman Empire. Since the republic was almost permanently at war, and was very good at it, no taxes were levied on Roman citizens in Italy from 167 B.C. onwards. # 27th January 2018, 4:51 pm

Domains Search for Web: Instant, Serverless & Global (via) The team at Zeit are pioneering a whole bunch of fascinating web engineering architectural patterns. Their new domain name autocomplete search uses Next.js and server-side rendering on first load, then switches to client-side rendering from then on. It can then load results asynchronously over a custom WebSocket protocol as the microservices on the backend finish resolving domain availability from the various different TLD providers. # 26th January 2018, 1:14 am

django-postgres-copy (via) Really neat Django queryset add-on which exposes the PostgreSQL COPY statement for importing (and exporting) CSV data. MyModel.objects.from_csv(“filename.csv”). Built by the team of data journalists at the California Civic Data Coalition. # 26th January 2018, 12:43 am

I spent more time on my iPhone X review than anything I’ve written in years, and it went to paper twice. (Here’s a scan of my second printed draft, with handwritten revisions.) My thing is that I don’t use my favorite pen — which, of course, has black ink — but instead a pen with red ink. Editing is an angry, bloody act and therefore must be done in red.

John Gruber # 25th January 2018, 1:43 pm

Nicaraguan Address System (via) “Instead of street names or numbers Nicaraguans use reference points from where they start describing a certain address. [...] There are instances, however, in which the reference points do not exist anymore!” # 21st January 2018, 4:32 pm

How to turn a list of JSON objects into a Datasette. ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette. # 20th January 2018, 1:07 am

GaretJax/django-click (via) I’ve been using Click to write command-line tools in Python recently (big datasette and csvs-to-sqlite use it) and its a delightful way of composing simple and complex CLI interfaces. I’ve always found Django’s default management command syntax hard to fit in my head—django-click means I can combine the two. # 19th January 2018, 11:19 pm

The biggest bottleneck in web performance today is CPU. Compared to seven years ago, there’s 5x more JavaScript downloaded on the top 1000 websites over the last seven years, and 3x more CSS. Half of web activity comes from mobile devices with a smaller CPU and limited battery power.

Steve Souders # 18th January 2018, 2:39 pm

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj. # 17th January 2018, 8:57 pm

API 2.0: Log-In with ZEIT, New Docs & More. Here’s Zeit’s write-up of their brand new API 2.0, which adds OAuth support and allows anything that can be done with their command-line tools to be achieved via their public API as well. This is the enabling technology that allowed me to build Datasette Publish. # 17th January 2018, 3:23 pm

Datasette Publish: a web app for publishing CSV files as an online database

I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.

[... 863 words]

A SIM Switch Account Takeover (Mine). Someone walked into a T-Mobile store with a fake ID in his name and stole Albert Wenger’s SIM identity, then used it to gain access to his Yahoo mail account, reset his Twitter password and post a tweet boosting a specific cryptocurrency. His accounts with Google Authenticator 2FA stayed safe. # 14th January 2018, 8:37 pm

How the industry-breaking Spectre bug stayed secret for seven months. It’s pretty amazing that the bug only became public knowledge a week before the intended embargo date, considering the number of individuals and companies that has to be looped in. The biggest public clues were patches being applied in public to the Linux kernel—one smart observer noted that the page table issue “has all the markings of a security patch being readied under pressure from a deadline.” # 14th January 2018, 4:53 pm

Telling stories through your commits. Joel Chippendale’s excellent guide to writing a useful commit history. I spend a lot of time on my commit messages, because when I’m trying to understand code later on they are the only form of documentation that is guaranteed to remain up-to-date against the code at that exact point of time. These tips are clear, concise, teadabale and include some great examples. # 13th January 2018, 7:44 pm

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017. # 13th January 2018, 7:40 pm

Incident report: npm. Fascinating insight into the challenges involved in managing a massive scale community code repository. An algorithm incorrectly labeled a legit user as spam, an NPM staff member acted on the report, dependent package installations started failing and because the package had been removed as spam other users were able to try and fix the bug by publishing fresh copies of the missing package to the same namespace. # 11th January 2018, 5:27 pm

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc. # 10th January 2018, 9:01 pm

ftfy—fix unicode that’s broken in various ways (via) I shipped a small web UI wrapper around the excellent Python FTFY library, which can take broken unicode strings and suggest a sequence of operations that can be applied to get back sensible text. # 9th January 2018, 3:22 am

csvkit. “A suite of command-line tools for converting to and working with CSV”—includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together. # 8th January 2018, 9:03 pm

[On Meltdown’s impact on hosting costs] The reality is that we have been living with borrowed performance. The new reality is that security is too important and can not be exchanged for speed. Time to profile, tune and optimize.

Miguel de Icaza‏ # 8th January 2018, 7:35 pm

Statistical NLP on OpenStreetMap. libpostal is ferociously clever: it’s a library for parsing and understanding worldwide addresses, built on top of a machine learning model trained on millions of addresses from OpenStreetMap. Al Barrentine describes how it works in this fascinating and detailed essay. # 8th January 2018, 7:33 pm