Simon Willison’s Weblog

Items in Apr, 2019

Filters: Year: 2019 × Month: Apr ×


Dockerfile for creating a Datasette of NHS dentist information (via) Really neat Dockerfile example by Alf Eaton that uses multi-stage builds to pull dentist information from the NHS, compile to SQLite using csvs-to-sqlite and serve the results with Datasette. TIL the NHS like to use ¬ as their CSV separator! # 26th April 2019, 2:09 pm

Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels (via) Clear tutorial on how to use Docker’s multi-stage build feature to create smaller final images by taking advantage of Python’s wheel format—so an initial stage can install a full compiler toolchain and compile C dependencies into wheels, then a later stage can install those pre-compiled wheels into a slimmer container without including the C compiler. # 26th April 2019, 2:05 pm

Hello world for ASGI running on Glitch (via) I’m continuing to experiment with Python 3 running on Glitch. This evening on my walk home from work I built this “hello world” demo on my phone, partly to see if Glitch was a workable mobile development environment—it passed with flying colours! The demo is a simple hello world implemented using the new ASGI 3.0 specification, running on the daphne reference server. Click the “via” link for my accompanying thread on Twitter, which includes a short screencast (also recorded on my phone) showing Glitch in action. # 26th April 2019, 5:06 am

One of the standards you have to have demonstrated to being able to reach Principle Engineer inside Amazon is “Respect what has gone before”. It’s very likely you don’t know the why, what or how of it. Often what was written was the best that could be done to the constraints.

Paul Graydon # 25th April 2019, 5:52 pm

Story Structure 104: The Juicy Details. Dan Harmon (Community, Rick and Morty) wrote a fascinating series of essays on story structure for his Channel 101 film festival project. It’s worth reading the whole series, but this chapter is where things get really detailed. # 25th April 2019, 1:17 pm

Language support on Glitch: a list (via) This is really useful: it’s essentially “Glitch: the missing manual” for running languages other than JavaScript. The Glitch community forums are a gold mine of useful information like this. # 23rd April 2019, 4:28 pm

Running Datasette on Glitch

The worst part of any software project is setting up a development environment. It’s by far the biggest barrier for anyone trying to get started learning to code. I’ve been a developer for more than twenty years and I still feel the pain any time I want to do something new.

[... 998 words]

Lots of people calling for more aggressive moderation seem to imagine that if they yell enough the companies have a thoughtful, unbiased and nuance-understanding HAL 9000 they can deploy. It’s really more like the Censorship DMV.

Alex Stamos # 21st April 2019, 4:36 pm

The Behavioral Change Stairway Model. BCSM is the FBI’s model for crisis negotiation, but it looks like it could be a useful negotiation framework for all kinds of other conflict mediation as well. # 19th April 2019, 5:46 pm

In Kākāpō breeding season news…. I posted on MetaFilter about this year’s record-breaking Kākāpō breeding season. # 19th April 2019, 3:11 am

Exploring Neural Networks with Activation Atlases. Another promising attempt at visualizing what’s going on inside a neural network. # 19th April 2019, 2:24 am

Using the HTML lang attribute (via) TIL the HTML lang attribute is used by screen readers to understand how to provide the correct accent and pronunciation. # 18th April 2019, 9:09 pm

How Zoom’s web client avoids using WebRTC (via) It turns out video conferencing app Zoom uses their own WebAssembly compiled video and audio codecs and transmits H264 over WebSockets. # 18th April 2019, 6:20 pm

An Intro to Threading in Python (via) Real Python consistently produces really comprehensive, high quality articles and tutorials. This is an excellent introduction to threading in Python, covering threads, locks, queues, ThreadPoolExecutor and more. # 18th April 2019, 5:24 am

Pyodide: Bringing the scientific Python stack to the browser (via) More fun with WebAssembly: Pyodide attempts (and mostly succeeds) to bring the full Python data stack to the browser: CPython, NumPy, Pandas, Scipy, and Matplotlib. Also includes interesting bridge tools for e.g. driving a canvas element from Python. Really interesting project from the Firefox Data Platform team. # 17th April 2019, 4:23 am

Wasmer: a Python library for executing WebAssembly binaries. This is a really interesting new tool: “pip install wasmer” and now you can load code that has been compiled to WebAssembly and call those functions directly from Python. It’s built on top of the wasmer universal WebAssembly runtime, written over just the past year in Rust by a team lead by Syrus Akbary, the author of the Graphene GraphQL library for Python. # 16th April 2019, 6:04 pm

ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} (via) Andrew Gallant’s post from September 2016 introducing ripgrep, the command-line grep tool he wrote using Rust (on top of the Rust regular expression library also written by Andrew). ripgrep is a beautifully designed CLI interface and is crazy fast, and this post describes how it gets its performance in a huge amount of detail, right down to comparing the different algorithmic approaches used by other similar tools. I recently learned that ripgrep ships as part of VS Code, which is why VS Code’s search-across-project feature is so fast. In fact, if you dig around in the OS X package you can find the rg binary already installed on your mac: find /Applications/Visual* | grep bin/rg # 16th April 2019, 5:52 pm

Datasette: ?_where=sql-fragment parameter for table views. I just shipped a tiny but really useful new feature to Datasette master: you can now add ?_where=sql-fragment on to the URL of any table view to inject additional SQL directly into the underlying WHERE clause. This tiny feature actually has some really interesting applications: I created this because I wanted to be able to run more complex custom SQL queries without losing access to the conveniences of Datasette’s table view, in particular the built-in faceting support. The feature actually fits in well with Datasette’s philosophy of allowing arbitrary SQL to be executed against a read-only database: you can turn this ability off using the allow_sql config flag. # 13th April 2019, 2 am

How to Create an Index in Django Without Downtime (via) Excellent advanced tutorial on Django migrations, which uses a desire to create indexes in PostgreSQL without locking the table (with CREATE INDEX CONCURRENTLY) to explain the SeparateDatabaseAndState and atomic features of Django’s migration framework. # 11th April 2019, 3:06 pm

Using 6 Page and 2 Page Documents To Make Organizational Decisions (via) I’ve been thinking a lot recently about the challenges of efficiently getting to consensus within a larger organization spread across multiple locations and time zones. This model described by Ian Nowland based on his experience at AWS seems very promising. The goal is to achieve a decision or “disagree and commit” consensus using a max 6 page document and a one hour meeting. The first fifteen minutes of the meeting are dedicated to silently reading the document—if you’ve read it already you are given the option of arriving fifteen minutes late. # 11th April 2019, 3:46 am

Ministry of Silly Runtimes: Vintage Python on Cloud Run (via) Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a “scale to zero” environment, so you only pay for time spent serving traffic. It’s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x. # 9th April 2019, 5:33 pm

Generator Tricks for Systems Programmers (via) David Beazley’s definitive generators tutorial from 2008, updated for Python 3.7 in October 2018. # 9th April 2019, 5:13 pm

In the five years since the shark was erected, no other examples have occurred … any system of control must make some small place for the dynamic, the unexpected, the downright quirky. I therefore recommend that the Headington Shark be allowed to remain.

Peter Macdonald # 9th April 2019, 1:58 pm

What is a Self-XSS scam? Facebook link to this page from a console.log message that they display the browser devtools console, specifically warning that “If someone told you to copy-paste something here to enable a Facebook feature or hack someone’s account, it is a scam and will give them access to your Facebook account.” # 8th April 2019, 6:01 pm

Colm MacCárthaigh tells the inside story of how AWS responded to Heartbleed. The Heartbleed SSL vulnerability came out five years ago. In this Twitter thread Colm, who was Amazon’s principal engineer for Elastic Load Balancer at the time, describes how the AWS team responded to something that “was scarier than any bug I’d ever seen”. It’s a cracking story. # 7th April 2019, 8:32 pm

tsv-utils (via) Powerful collection of CLI tools for processing TSV files, written in D for performance and released by eBay. Includes a csv2tsv conversion tool. You can download an archive of pre-built binaries for Linux and OS X from their releases page: worked fine on my Mac. # 7th April 2019, 8:29 pm

csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits! # 7th April 2019, 8:03 pm

The problem with laziness: minimising performance issues caused by Django’s implicit database queries (via) The ability to accidentally execute further database queries by traversing objects from a Django template is a common source of unexpected performance regressions. django-zen-queries is a neat new library which provides a context manager for disabling database queries during a render (or elsewhere), forcing queries to be explicitly executed in view functions. # 3rd April 2019, 3:49 pm

zson (via) “ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents [...] In some cases ZSON can save half of your disk space and give you about 10% more TPS.” # 2nd April 2019, 9:26 pm