Simon Willison’s Weblog

Blogmarks in Apr

Filters: Type: blogmark × Month: Apr ×


Colm MacCárthaigh tells the inside story of how AWS responded to Heartbleed. The Heartbleed SSL vulnerability came out five years ago. In this Twitter thread Colm, who was Amazon’s principal engineer for Elastic Load Balancer at the time, describes how the AWS team responded to something that “was scarier than any bug I’d ever seen”. It’s a cracking story. # 7th April 2019, 8:32 pm

tsv-utils (via) Powerful collection of CLI tools for processing TSV files, written in D for performance and released by eBay. Includes a csv2tsv conversion tool. You can download an archive of pre-built binaries for Linux and OS X from their releases page: worked fine on my Mac. # 7th April 2019, 8:29 pm

csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits! # 7th April 2019, 8:03 pm

The problem with laziness: minimising performance issues caused by Django’s implicit database queries (via) The ability to accidentally execute further database queries by traversing objects from a Django template is a common source of unexpected performance regressions. django-zen-queries is a neat new library which provides a context manager for disabling database queries during a render (or elsewhere), forcing queries to be explicitly executed in view functions. # 3rd April 2019, 3:49 pm

zson (via) “ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents [...] In some cases ZSON can save half of your disk space and give you about 10% more TPS.” # 2nd April 2019, 9:26 pm

Datasette—a talk at Zeit Day SF 2018 (via) Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference. # 28th April 2018, 9:31 pm

Make Near Me (via) The natural evolution of owlsnearme.com—Make Near Me uses the Zeit Now API to allow anyone to deploy their own version of Owls Near Me for any species! I announced this on stage at Zeit Day SF 2018 as part of my talk on Datasette and Datasette Publish. # 28th April 2018, 9:28 pm

Scaling a High-traffic Rate Limiting Stack With Redis Cluster. Brandur Leach describes the simple, elegant and performant design of Redis Cluster, and talks about how Stripe used it to scaled their rate-limiting from one to ten nodes. # 26th April 2018, 6:34 pm

System-Versioned Tables in MariaDB (via) Fascinating new feature from the SQL:2011 standard that’s getting its first working implementation in MariaDB 10.3.4. “ALTER TABLE products ADD SYSTEM VERSIONING;” causes the database to store every change made to that table—then you can run “SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;” to query the data as of a specific point in time. I’ve tried all manner of horrible mechanisms for achieving this in the past, having it baked into the database itself would be fantastic. # 25th April 2018, 2:34 pm

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser. # 25th April 2018, 5:17 am

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files. # 25th April 2018, 4:13 am

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well. # 24th April 2018, 4:17 pm

csvs-to-sqlite 0.8. I released a new version of my csvs-to-sqlite tool this morning with a bunch of handy new features. It can now rename columns and define their types, add the CSV filenames as an additional column, add create indexes on columns and parse dates and datetimes into SQLite-friendly ISO formatted values. # 24th April 2018, 4:11 pm

react-jsonschema-form. Exciting library from the Mozilla Services team: given a JSON Schema definition, react-jsonschema-form can produce a cleanly designed React-powered form for adding and editing data that matches that schema. Includes support for adding multiple items in a nested array, re-ordering them, custom form widgets and more. # 23rd April 2018, 9:38 pm

Why it took a long time to build that tiny link preview on Wikipedia (via) Wikipedia now shows a little preview card on internal links with an image and summary paragraph of the linked page. As a Wikpedia user I absolutely love this feature—and as an engineer and product designer, it’s fascinating to hear the challenges they overcame to ship it. Of particular interest: actually generating a useful summary of a page, while stripping out the cruft that often accumulates at the beginning of their text. It’s also an impressive scaling challenge: the API they use for this feature is now handling more than 500,000 requests per minute. # 23rd April 2018, 9:07 pm

Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data (via) Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA # 20th April 2018, 8:50 pm

I submitted a PWA to 3 app stores. Here’s what I learned (via) Useful real-world experience shipping a progressive web app to the iOS, Android and Windows app stores. # 19th April 2018, 9:06 pm

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries. # 19th April 2018, 6:34 pm

Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library. # 19th April 2018, 6:32 pm

How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin. # 19th April 2018, 6:30 pm

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before. # 19th April 2018, 6:28 pm

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm

What’s New in MySQL 8.0. MySQL 8 has lots of exciting improvements: Window functions, SRS aware spatial types for GIS, utf8mb4 by default, a ton of JSON improvements and atomic DDL. I no longer feel at a significant disadvantage when I have to use MySQL in place of PostgreSQL. # 19th April 2018, 4:03 pm

Text Embedding Models Contain Bias. Here’s Why That Matters (via) Excellent discussion from the Google AI team of the enormous challenge of building machine learning models without accidentally encoding harmful bias in a way that cannot be easily detected. # 17th April 2018, 8:54 pm

Datasette 0.19: Plugins Documentation (via) I’ve released the first preview of Datasette’s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come. # 17th April 2018, 3:59 am

Datasette 0.18: units (via) This release features the first Datasette feature that was entirely designed and implemented by someone else (yay open source)—Russ Garrett wanted unit support (Hz, ft etc) for his Wireless Telegraphy Register project. It’s a really neat implementation: you can tell Datasette what units are in use for a particular database column and it will display the correct SI symbols on the page. Specifying units also enables unit-aware filtering: if Datasette knows that a column is measured in meters you can now query it for all rows that are less than 50 feet for example. # 14th April 2018, 3:56 pm

What do you mean “average”? (via) Lovely example of an interactive explorable demonstrating mode/mean/median, built as an Observable notebook using D3. # 12th April 2018, 4:41 pm

Wireless Telegraphy Register (via) Russ Garrett used Datasette to build a browsable interface to the UK’s register of business radio licenses, using data from Ofcom. # 12th April 2018, 4:08 pm

Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend. # 12th April 2018, 3:44 pm

The Academic Vanity Honeypot phishing scheme. Twitter thread describing a nasty phishing attack where an academic receives an email from a respected peer congratulating them on a recent article and suggesting further reading. The further reading link is a phishing site that emulates the victim’s institution’s login page. # 12th April 2018, 3:07 pm