Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Observable Beta (via) Observable just released their beta, and it’s quite something. It’s by Mike Bostock (d3), Jeremy Ashkenas (Backbone, CoffeeScript) and Tom MacWright (Mapbox Studio). The easiest way to describe it is Jupyter notebooks for JavaScript supporting reactive programming—so code is evaluated as you type and you can add interactive widgets (like sliders and canvas views) to construct explorable visualizations on the fly.

# 31st January 2018, 4:46 pm / javascript, jeremy-ashkenas, d3, jupyter, observable, mike-bostock, tom-macwright

SQLite: The Spellfix1 Virtual Table (via) A SQLite extension that lets you create a spellfix1 virtual table which can power “fuzzy” search, by suggesting corrections for misspelled words. I haven’t tried this yet but it looks pretty powerful, including a configurable edit distance and the ability to set up custom “soundslike” terms for words with known unusual spellings.

# 29th January 2018, 5:24 am / sqlite

6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations? Citizen science app iNaturalist is seeing explosive growth at the moment—they’ve been around for nearly a decade but 1/6 of the observations posted to the site were added in just the past few months. Having tried the latest version of their iPhone app it’s easy to see why: snap a photo of some nature and upload it to the app and it will use surprisingly effective machine learning to suggest the genus or even the individual species. Submit the observation and within a few minutes other iNaturalist community members will confirm the identification or suggest a correction. It’s brilliantly well executed and an utter delight to use.

# 28th January 2018, 8:18 pm / computer-vision, crowdsourcing, machine-learning, science, citizenscience, inaturalist

How did the Roman Republic determine its budget? Fascinating answer on the AskHistorians subreddit about how taxation worked in the Roman Empire. Since the republic was almost permanently at war, and was very good at it, no taxes were levied on Roman citizens in Italy from 167 B.C. onwards.

# 27th January 2018, 4:51 pm / historians

Domains Search for Web: Instant, Serverless & Global (via) The team at Zeit are pioneering a whole bunch of fascinating web engineering architectural patterns. Their new domain name autocomplete search uses Next.js and server-side rendering on first load, then switches to client-side rendering from then on. It can then load results asynchronously over a custom WebSocket protocol as the microservices on the backend finish resolving domain availability from the various different TLD providers.

# 26th January 2018, 1:14 am / domains, websockets, zeit-now, microservices

django-postgres-copy (via) Really neat Django queryset add-on which exposes the PostgreSQL COPY statement for importing (and exporting) CSV data. MyModel.objects.from_csv(“filename.csv”). Built by the team of data journalists at the California Civic Data Coalition.

# 26th January 2018, 12:43 am / csv, django, postgresql

Nicaraguan Address System (via) “Instead of street names or numbers Nicaraguans use reference points from where they start describing a certain address. [...] There are instances, however, in which the reference points do not exist anymore!”

# 21st January 2018, 4:32 pm / geospatial

How to turn a list of JSON objects into a Datasette. ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette.

# 20th January 2018, 1:07 am / json, pandas, sqlite, datasette

GaretJax/django-click (via) I’ve been using Click to write command-line tools in Python recently (big datasette and csvs-to-sqlite use it) and its a delightful way of composing simple and complex CLI interfaces. I’ve always found Django’s default management command syntax hard to fit in my head—django-click means I can combine the two.

# 19th January 2018, 11:19 pm / cli, django

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj.

# 17th January 2018, 8:57 pm / geospatial, python

API 2.0: Log-In with ZEIT, New Docs & More. Here’s Zeit’s write-up of their brand new API 2.0, which adds OAuth support and allows anything that can be done with their command-line tools to be achieved via their public API as well. This is the enabling technology that allowed me to build Datasette Publish.

# 17th January 2018, 3:23 pm / zeit-now

A SIM Switch Account Takeover (Mine). Someone walked into a T-Mobile store with a fake ID in his name and stole Albert Wenger’s SIM identity, then used it to gain access to his Yahoo mail account, reset his Twitter password and post a tweet boosting a specific cryptocurrency. His accounts with Google Authenticator 2FA stayed safe.

# 14th January 2018, 8:37 pm / identitytheft, security, sms

How the industry-breaking Spectre bug stayed secret for seven months. It’s pretty amazing that the bug only became public knowledge a week before the intended embargo date, considering the number of individuals and companies that has to be looped in. The biggest public clues were patches being applied in public to the Linux kernel—one smart observer noted that the page table issue “has all the markings of a security patch being readied under pressure from a deadline.”

# 14th January 2018, 4:53 pm / security

Telling stories through your commits. Joel Chippendale’s excellent guide to writing a useful commit history. I spend a lot of time on my commit messages, because when I’m trying to understand code later on they are the only form of documentation that is guaranteed to remain up-to-date against the code at that exact point of time. These tips are clear, concise, teadabale and include some great examples.

# 13th January 2018, 7:44 pm / git, sourcecontrol

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017.

# 13th January 2018, 7:40 pm / python, kafka

Incident report: npm. Fascinating insight into the challenges involved in managing a massive scale community code repository. An algorithm incorrectly labeled a legit user as spam, an NPM staff member acted on the report, dependent package installations started failing and because the package had been removed as spam other users were able to try and fix the bug by publishing fresh copies of the missing package to the same namespace.

# 11th January 2018, 5:27 pm / security, spammers, npm

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc.

# 10th January 2018, 9:01 pm / json, macos, sqlite, stackoverflow, datasette

ftfy—fix unicode that’s broken in various ways (via) I shipped a small web UI wrapper around the excellent Python FTFY library, which can take broken unicode strings and suggest a sequence of operations that can be applied to get back sensible text.

# 9th January 2018, 3:22 am / projects, unicode, zeit-now

csvkit. “A suite of command-line tools for converting to and working with CSV”—includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together.

# 8th January 2018, 9:03 pm / csv, datasette

Statistical NLP on OpenStreetMap. libpostal is ferociously clever: it’s a library for parsing and understanding worldwide addresses, built on top of a machine learning model trained on millions of addresses from OpenStreetMap. Al Barrentine describes how it works in this fascinating and detailed essay.

# 8th January 2018, 7:33 pm / machine-learning, nlp, openstreetmap

Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette (via) The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepalese Himalaya, originally compiled by journalist Elizabeth Hawley over several decades. The database is published as a Visual FoxPro database—here Raffaele Messuti‏ provides step-by-step instructions for extracting the data from the published archive, converting them to CSV using dbfcsv and then converting the CSVs to SQLite using csvs-to-sqlite so you can browse them using Datasette.

# 8th January 2018, 7:26 pm / csv, datasette

Frontend in 2017: The important parts. Keeping track of developments in the frontend and JavaScript community is pretty much a full time job here days, so I found this summary of trends and developments over 2017 very useful for trying to catch up.

# 29th December 2017, 8:30 am / css, javascript, frontend

Game developer’s guide to graphical projections (with video game examples), Part 1: Introduction. Absolutely delightful series of illustrated essays by Matej ‘Retro’ Jan explaining how different graphical projections can be used for video game art. Each concept is illustrated by screenshots or gifs from a mixture of games spanning four decades. Reading this was a real treat.

# 28th December 2017, 1:07 am / art, game-design, games, pixelart

My Internet Mea Culpa. Rick Webb asks “What if we were wrong?” about the internet leading to enormous benefit for humankind. I’ve been worrying about this a lot recently: it turns out the internet provides tools that allow bad people to spread lies, propaganda and discrimination with lethal effectiveness. It’s hard to believe that universal access to the sum of all human knowledge can have negative effects, but there are clearly a whole load of negative effects that us internet utopians failed to predict.

# 26th December 2017, 2:59 pm / internet

Computer latency: 1977-2017 (via) Dan Luu used a 240 fps camera to investigate the latency between hitting a key and having the character show up on the display across four decades of computing devices... and found 1983’s Apple IIe outperformed everything else. He goes to great lengths to explain why in his accompanying write-up.

# 26th December 2017, 12:28 am / performance, dan-luu

Google Maps’s Moat. Gorgeous essay by digital cartographer Justin O’Beirne, exploring how Google Maps has evolved over time and how the fantastically useful “areas of interest” feature (where commercial corridors and business districts are automatically highlighted) uses data derived from a combination of Street View business data and 3D building outlines derived from aerial imagery.

# 23rd December 2017, 9:35 am / cartography, google-maps, streetview

How to compile and run a Pony program using Docker. My notes on using the Docker ponylang/ponyc container to compile and execute a Pony program without needing to install anything (since Docker will download and run the image the first time you run the command).

# 18th December 2017, 9:47 pm / pony, docker

An Early History of Pony. Pony is an interesting looking new programming language, built around actor-based concurrency on top of a mathematically proved type system. The history of the language makes for interesting reading: it’s based on experience with actor libraries in C at an investment bank, combined with research into type systems at Imperial College London.

# 18th December 2017, 8:33 pm / computer-science, pony, programming-languages

How do Ruby & Python profilers work? Julia Evans: “As a precursor to writing a Ruby profiler I wanted to do a survey of how existing Ruby & Python profilers work.”

# 18th December 2017, 12:12 pm / profiler, julia-evans

The Mirai Botnet Was Part of a College Student Minecraft Scheme. Fascinating story about last year’s Mirai botnet, which was originally developed to help corner the Minecraft server market.

# 15th December 2017, 3:18 am / denial-of-service, security, minecraft

Years

Tags