Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Amazon’s Away Teams laid bare: How AWS’s hivemind of engineers develop and maintain their internal tech (via) Some interesting insights into how Amazon structure their engineering organization to maximize team productivity in a service-oriented environment. Two things that stood out to me: each service is owned by a “home team”, but sometimes features that are needed by other teams can be built by forming an “away team” to build out that functionality. Secondly, Amazon has a concept of “bar raisers” who are engineers across the organization who help approve key design and architectural decisions. It’s possible to go against the recommendation of a bar raiser but “such a move is noted and made visible to higher levels of management”.

# 14th May 2019, 6:32 pm / amazon, service-oriented-architecture, management

asgi-cors (via) I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist).

# 7th May 2019, 12:12 am / projects, security, asgi, cors

Want to see what one digital future for newspapers looks like? Look at The Guardian, which isn’t losing money anymore (via) After losing money every single year since 1998, the Guardian just managed to turn a profit! Detailed analysis of how they did it by Joshua Benton.

# 2nd May 2019, 5:49 am / guardian, newspapers

A Conspiracy To Kill IE6 (via) Cracking story by Chris Zacharias about how a team of engineers at YouTube back in 2009 took advantage of some exploits in YouTube’s organization structure (left over from their acquisition by Google) to ship a vague IE6 deprecation warning banner on one of the world’s highest traffic websites, inspiring many other similar banners and resulting in a 10% drop in global IE6 traffic.

# 1st May 2019, 8:26 pm / ie6, youtube

JSK Journalism Fellowships names Class of 2019-2020 (and I’m in it!) (via) In personal news... I’ve been accepted for a ten month journalism fellowship at Stanford (starting September)! My work there will involve “Improving the impact of investigative stories by expanding the open-source ecosystem of tools that allows journalists to share the underlying data”.

# 1st May 2019, 4:43 pm / data-journalism, journalism, stanford, datasette, jsk, personal-news

Dockerfile for creating a Datasette of NHS dentist information (via) Really neat Dockerfile example by Alf Eaton that uses multi-stage builds to pull dentist information from the NHS, compile to SQLite using csvs-to-sqlite and serve the results with Datasette. TIL the NHS like to use ¬ as their CSV separator!

# 26th April 2019, 2:09 pm / alf-eaton, csv, docker, datasette

Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels (via) Clear tutorial on how to use Docker’s multi-stage build feature to create smaller final images by taking advantage of Python’s wheel format—so an initial stage can install a full compiler toolchain and compile C dependencies into wheels, then a later stage can install those pre-compiled wheels into a slimmer container without including the C compiler.

# 26th April 2019, 2:05 pm / python, docker

Hello world for ASGI running on Glitch (via) I’m continuing to experiment with Python 3 running on Glitch. This evening on my walk home from work I built this “hello world” demo on my phone, partly to see if Glitch was a workable mobile development environment—it passed with flying colours! The demo is a simple hello world implemented using the new ASGI 3.0 specification, running on the daphne reference server. Click the “via” link for my accompanying thread on Twitter, which includes a short screencast (also recorded on my phone) showing Glitch in action.

# 26th April 2019, 5:06 am / glitch, projects, asgi

Story Structure 104: The Juicy Details. Dan Harmon (Community, Rick and Morty) wrote a fascinating series of essays on story structure for his Channel 101 film festival project. It’s worth reading the whole series, but this chapter is where things get really detailed.

# 25th April 2019, 1:17 pm / writing, screen-writing

Language support on Glitch: a list (via) This is really useful: it’s essentially “Glitch: the missing manual” for running languages other than JavaScript. The Glitch community forums are a gold mine of useful information like this.

# 23rd April 2019, 4:28 pm / glitch

The Behavioral Change Stairway Model. BCSM is the FBI’s model for crisis negotiation, but it looks like it could be a useful negotiation framework for all kinds of other conflict mediation as well.

# 19th April 2019, 5:46 pm / communication, management

In Kākāpō breeding season news…. I posted on MetaFilter about this year’s record-breaking Kākāpō breeding season.

# 19th April 2019, 3:11 am / metafilter, kakapo, conservation

Exploring Neural Networks with Activation Atlases. Another promising attempt at visualizing what’s going on inside a neural network.

# 19th April 2019, 2:24 am / machine-learning, visualization

Using the HTML lang attribute (via) TIL the HTML lang attribute is used by screen readers to understand how to provide the correct accent and pronunciation.

# 18th April 2019, 9:09 pm / accessibility, html, l10n, screen-readers

How Zoom’s web client avoids using WebRTC (via) It turns out video conferencing app Zoom uses their own WebAssembly compiled video and audio codecs and transmits H264 over WebSockets.

# 18th April 2019, 6:20 pm / websockets, webassembly

An Intro to Threading in Python (via) Real Python consistently produces really comprehensive, high quality articles and tutorials. This is an excellent introduction to threading in Python, covering threads, locks, queues, ThreadPoolExecutor and more.

# 18th April 2019, 5:24 am / python, threads

Pyodide: Bringing the scientific Python stack to the browser (via) More fun with WebAssembly: Pyodide attempts (and mostly succeeds) to bring the full Python data stack to the browser: CPython, NumPy, Pandas, Scipy, and Matplotlib. Also includes interesting bridge tools for e.g. driving a canvas element from Python. Really interesting project from the Firefox Data Platform team.

# 17th April 2019, 4:23 am / mozilla, pandas, python, scipy, webassembly, numpy, pyodide

Wasmer: a Python library for executing WebAssembly binaries. This is a really interesting new tool: “pip install wasmer” and now you can load code that has been compiled to WebAssembly and call those functions directly from Python. It’s built on top of the wasmer universal WebAssembly runtime, written over just the past year in Rust by a team lead by Syrus Akbary, the author of the Graphene GraphQL library for Python.

# 16th April 2019, 6:04 pm / python, rust, webassembly

ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} (via) Andrew Gallant's post from September 2016 introducing ripgrep, the command-line grep tool he wrote using Rust (on top of the Rust regular expression library also written by Andrew). ripgrep is a beautifully designed CLI interface and is crazy fast, and this post describes how it gets its performance in a huge amount of detail, right down to comparing the different algorithmic approaches used by other similar tools.

I recently learned that ripgrep ships as part of VS Code, which is why VS Code's search-across-project feature is so fast. In fact, if you dig around in the OS X package you can find the rg binary already installed on your mac:

find /Applications/Visual* | grep bin/rg

# 16th April 2019, 5:52 pm / cli, rust, ripgrep, vs-code, andrew-gallant

Datasette: ?_where=sql-fragment parameter for table views. I just shipped a tiny but really useful new feature to Datasette master: you can now add ?_where=sql-fragment on to the URL of any table view to inject additional SQL directly into the underlying WHERE clause. This tiny feature actually has some really interesting applications: I created this because I wanted to be able to run more complex custom SQL queries without losing access to the conveniences of Datasette’s table view, in particular the built-in faceting support. The feature actually fits in well with Datasette’s philosophy of allowing arbitrary SQL to be executed against a read-only database: you can turn this ability off using the allow_sql config flag.

# 13th April 2019, 2 am / sql, datasette

How to Create an Index in Django Without Downtime (via) Excellent advanced tutorial on Django migrations, which uses a desire to create indexes in PostgreSQL without locking the table (with CREATE INDEX CONCURRENTLY) to explain the SeparateDatabaseAndState and atomic features of Django’s migration framework.

# 11th April 2019, 3:06 pm / django, migrations, postgresql, zero-downtime

Using 6 Page and 2 Page Documents To Make Organizational Decisions (via) I’ve been thinking a lot recently about the challenges of efficiently getting to consensus within a larger organization spread across multiple locations and time zones. This model described by Ian Nowland based on his experience at AWS seems very promising. The goal is to achieve a decision or “disagree and commit” consensus using a max 6 page document and a one hour meeting. The first fifteen minutes of the meeting are dedicated to silently reading the document—if you’ve read it already you are given the option of arriving fifteen minutes late.

# 11th April 2019, 3:46 am / aws, process, management

Ministry of Silly Runtimes: Vintage Python on Cloud Run (via) Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a “scale to zero” environment, so you only pay for time spent serving traffic. It’s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x.

# 9th April 2019, 5:33 pm / cloud, python, zeit-now, docker, datasette, cloudrun, dustin-ingram

Generator Tricks for Systems Programmers (via) David Beazley’s definitive generators tutorial from 2008, updated for Python 3.7 in October 2018.

# 9th April 2019, 5:13 pm / david-beazley, generators, python

What is a Self-XSS scam? Facebook link to this page from a console.log message that they display the browser devtools console, specifically warning that “If someone told you to copy-paste something here to enable a Facebook feature or hack someone’s account, it is a scam and will give them access to your Facebook account.”

# 8th April 2019, 6:01 pm / facebook, security, xss

Colm MacCárthaigh tells the inside story of how AWS responded to Heartbleed. The Heartbleed SSL vulnerability came out five years ago. In this Twitter thread Colm, who was Amazon’s principal engineer for Elastic Load Balancer at the time, describes how the AWS team responded to something that “was scarier than any bug I’d ever seen”. It’s a cracking story.

# 7th April 2019, 8:32 pm / aws, security

tsv-utils (via) Powerful collection of CLI tools for processing TSV files, written in D for performance and released by eBay. Includes a csv2tsv conversion tool. You can download an archive of pre-built binaries for Linux and OS X from their releases page: worked fine on my Mac.

# 7th April 2019, 8:29 pm / cli, csv

csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits!

# 7th April 2019, 8:03 pm / cli, csv, diff, projects

The problem with laziness: minimising performance issues caused by Django’s implicit database queries (via) The ability to accidentally execute further database queries by traversing objects from a Django template is a common source of unexpected performance regressions. django-zen-queries is a neat new library which provides a context manager for disabling database queries during a render (or elsewhere), forcing queries to be explicitly executed in view functions.

# 3rd April 2019, 3:49 pm / django

zson (via) “ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents [...] In some cases ZSON can save half of your disk space and give you about 10% more TPS.”

# 2nd April 2019, 9:26 pm / json, postgresql

Years

Tags