Simon Willison’s Weblog

Subscribe

Items in 2018

Filters: Year: 2018 × Sorted by date


Telling stories through your commits. Joel Chippendale’s excellent guide to writing a useful commit history. I spend a lot of time on my commit messages, because when I’m trying to understand code later on they are the only form of documentation that is guaranteed to remain up-to-date against the code at that exact point of time. These tips are clear, concise, teadabale and include some great examples. # 13th January 2018, 7:44 pm

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017. # 13th January 2018, 7:40 pm

Incident report: npm. Fascinating insight into the challenges involved in managing a massive scale community code repository. An algorithm incorrectly labeled a legit user as spam, an NPM staff member acted on the report, dependent package installations started failing and because the package had been removed as spam other users were able to try and fix the bug by publishing fresh copies of the missing package to the same namespace. # 11th January 2018, 5:27 pm

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc. # 10th January 2018, 9:01 pm

ftfy—fix unicode that’s broken in various ways (via) I shipped a small web UI wrapper around the excellent Python FTFY library, which can take broken unicode strings and suggest a sequence of operations that can be applied to get back sensible text. # 9th January 2018, 3:22 am

csvkit. “A suite of command-line tools for converting to and working with CSV”—includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together. # 8th January 2018, 9:03 pm

[On Meltdown’s impact on hosting costs] The reality is that we have been living with borrowed performance. The new reality is that security is too important and can not be exchanged for speed. Time to profile, tune and optimize.

Miguel de Icaza‏ # 8th January 2018, 7:35 pm

Statistical NLP on OpenStreetMap. libpostal is ferociously clever: it’s a library for parsing and understanding worldwide addresses, built on top of a machine learning model trained on millions of addresses from OpenStreetMap. Al Barrentine describes how it works in this fascinating and detailed essay. # 8th January 2018, 7:33 pm

Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette (via) The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepalese Himalaya, originally compiled by journalist Elizabeth Hawley over several decades. The database is published as a Visual FoxPro database—here Raffaele Messuti‏ provides step-by-step instructions for extracting the data from the published archive, converting them to CSV using dbfcsv and then converting the CSVs to SQLite using csvs-to-sqlite so you can browse them using Datasette. # 8th January 2018, 7:26 pm

Most infosec bugs are really boring after a while. But processor ones are always crazy and fascinating because processors are basically a hornet’s nest of witchcraft and mayhem stacked on top of each other all the way down.

Matt Tait # 2nd January 2018, 9:54 am