Simon Willison’s Weblog


Filters: Type: blogmark ×

Subsume JSON a.k.a. JSON ⊂ ECMAScript (via) TIL that JSON isn’t a subset of ECMAScript after all! “In ES2018, ECMAScript string literals couldn’t contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters, because they are considered to be line terminators even in that context.” # 15th August 2019, 10:30 am

OPP (Other People’s Problems) (via) Camille Fournier provides a comprehensive guide to picking your battles: in a large organization how can you navigate the enormous array of problems you can see that you’d like to fix, especially when so many of those problems aren’t directly in your area of control? # 7th August 2019, 1:58 pm

Optimizing for the mobile web: Moving from Angular to Preact. Grubhub reduced their mobile web load times from 9-11s to 3-4s by replacing Angular with Preact (and replacing other libraries such as lodash with native JavaScript code). The conversion took 6 months and involved running Angular and Preact simultaneously during the transition—not a huge additional overhead as Preact itself is only 4KB. They used TypeScript throughout and credit it with providing a great deal of confidence and productivity to the overall refactoring. # 5th August 2019, 12:26 pm

Working with many-to-many relationships in sqlite-utils (via) I just released sqlite-utils 1.9 with syntactic sugar support for creating many-to-many relationships for records stored in SQLite databases. # 4th August 2019, 3:57 am

Logs vs. metrics: a false dichotomy (via) Nick Stenning discusses the differences between logs and metrics: most notably that metrics can be derived from logs but logs cannot be reconstituted starting with time-series metrics. # 3rd August 2019, 4:46 pm

PyPI now supports uploading via API token (via) All of my open source Python libraries are set up to automatically deploy new tagged releases as PyPI packages using Circle CI or Travis, but I’ve always get a bit uncomfortable about sharing my PyPI password with those CI platforms to get this to work. PyPI just added scopes authentication tokens, which means I can issue a token that’s only allowed to upload a specific project and see an audit log of when that token was last used. # 1st August 2019, 4:03 pm

Repository driven development (via) I’m already a big fan of keeping documentation and code in the same repo so you can update them both from within the same code review, but this takes it even further: in repository driven development every aspect of the code and configuration needed to define, document, test and ship a service live in the service repository—all the way down to the configurations for reporting dashboards. This sounds like heaven. # 24th July 2019, 8:41 am

Using memory-profiler to debug excessive memory usage in healthkit-to-sqlite. This morning I figured out how to use the memory-profiler module (and mprof command line tool) to debug memory usage of Python processes. I added the details, including screenshots, to this GitHub issue. It helped me knock down RAM usage for my healthkit-to-sqlite from 2.5GB to just 80MB by making smarter usage of the ElementTree pull parser. # 24th July 2019, 8:25 am

Targeted diagnostic logging in production (via) Will Sargent defines diagnostic logging as “debug logging statements with an audience”, and proposes controlling this style if logging via a feature flat system to allow detailed logging to be turned on in production against a selected subset if users in order to help debug difficult problems. Lots of great background material in the topic of observability here too. # 24th July 2019, 5:44 am

When a rewrite isn’t: rebuilding Slack on the desktop. Slack appear to have pulled off the almost impossible: finishing a complete, incremental rewrite of their core product. They moved from jQuery to React over the course of two years, constantly shipping new features as they went along. The biggest gain was in rewriting their code to support multiple workspaces, which means desktop client users no longer have to run a separate copy of Electron for every workspace they are signed into. # 22nd July 2019, 6:30 pm

healthkit-to-sqlite. Ever since I got an Apple Watch I’ve been itching to get my hands on the step tracking and health data that it’s been collecting for me. I know it’s there in a SQLite database on my wrist, but I couldn’t figure out how to get it! A few days ago I stumbled across the “Export Health Data” button in the iOS Health app, and it turns out it creates a zip file containing XML with a full dump of the data collected by Apple Health. healthkit-to-sqlite is the tool I’ve built that can read that export and use it to create a SQLite database ready to be queried and explored with Datasette. It’s a pretty basic implementation but it’s already giving me access to over 3 million rows of data. Lots of potential here for interesting work with personal analytics. # 22nd July 2019, 3:34 am

Unlocking the Department of State’s foreign military training data for good this time (via) I’m so excited about this: Security Force Monitor used Datasette to publish a 200,000 row database of training engagements between the US military and foreign military units, based on their own massive efforts to clean up the official data (from thousands of PDF files). This is pretty much my dream use-case for Datasette, and their future goals are inspiring: “Our hope is that when the next report arrives in a short few months, we will be able to turn it into machine readable data and pass it around the sector in minutes, rather than months.” # 20th July 2019, 4:42 am

Details of the Cloudflare outage on July 2, 2019 (via) Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff. # 12th July 2019, 5:36 pm

datasette-cors (via) My other Datasette ASGI plugin: this one wraps my asgi-cors project and lets you configure CORS access from a list of domains (or a set of domain wildcards) so you can make JavaScript calls to a Datasette instance from a specific set of other hosts. # 8th July 2019, 4:30 am

datasette-auth-github (via) My first big ASGI plugin for Datasette: datasette-auth-github adds the ability to require users to authenticate against the GitHub OAuth API. You can whitelist specific users, or you can restrict access to members of specific GitHub organizations or teams. While it’s structured as a Datasette plugin it also includes ASGI middleware which can be applied to any ASGI application. # 8th July 2019, 4:28 am

Datasette 0.29 (via) I shipped Datasette 0.29! • ASGI all the way down! Plus a new asgi_wrapper plugin hook letting plugins do all kinds of powerful new things • New mechanism for secret plugin configuration options • Facet by date • ?_through= for joins through m2m tables. Much more. # 8th July 2019, 4:26 am

How FZF and ripgrep improved my workflow (via) I’m already a keen user of ripgrep (a crazy-fast grep alternative) but fzf was new to me: it’s a CLI utility that lets you pipe in a list of strings, then gives you a typeahead search interface to search and select a string before returning the selected string to stdout when you hit enter. This means you can pipe it together with other tools to add a dynamic selection step, which has all kinds of delightful combinations. “vi $(find . | fzf)” for example opens vi against the file you selected. # 5th July 2019, 5:51 pm

PugSQL. Interesting new twist on a definitely-not-an-ORM library for Python. With PugSQL you define SQL queries in files, give them names and then load them into a module which allows you to execute them as Python methods with keyword arguments. You can mark statements as only returning a single row (or a single scalar value) with a comment at the top of their file. # 3rd July 2019, 6:19 pm

Choose Boring Technology (via) The definitive write-up of Dan McKinley’s presentation on why you should mostly use “boring” technology rather than going after the latest shiniest stack components. There’s so much accumulated wisdom in here. I particularly like how Dan owns up to having introduced Scala and MongoDB at Etsy before eventually helping remove them and go back to something less exciting and far more predictable. Also neat: the site is generated using Dan’s better-keynote-export tool which helps turn Keynote presentations into a flat web page with notes and images. # 3rd July 2019, 6:09 pm

db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable. # 1st July 2019, 1:35 am

Feature Toggles (aka Feature Flags). I’m a huge fan of feature flags as a way of managing feature releases and keeping incomplete code in master as opposed to maintaining long-running branches. Recently I’ve found myself pointing people to this essay by Pete Hodgson—it’s a great overview of feature flags (here called toggles) and I particularly like how it splits them into four categories: Release Toggles, Experiment Toggles, Ops Toggles and Permissioning Toggles. # 25th June 2019, 9:13 pm

json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI. # 22nd June 2019, 4:51 am

Announcing Envoy Mobile. This is a fascinating development: Lyft’s Envoy proxy / service mesh has been widely adopted across the industry as a server-side component for adding smart routing and observability to the network calls made between services in microservice architectures. “The reality is that three 9s at the server-side edge is meaningless if the user of a mobile application is only able to complete the desired product flows a fraction of the time”—so Lyft are building a C++ embedded library companion to Envoy which is designed to be shipped as part of iOS and Android client applications. “Envoy Mobile in conjunction with Envoy in the data center will provide the ability to reason about the entire distributed system network, not just the server-side portion.” Their decision to release an early working prototype and then conduct ongoing development entirely in the open is interesting too. # 18th June 2019, 6:42 pm

Toward a “Kernel Python” (via) Glyph makes a strong case for releasing a slimmed down “kernel” version of Python with the minimal possible standard library, and argues that the current standard library is proving impossible for a single core team to productively maintain. “If I wanted to update the colorsys module to be more modern—perhaps to have a Color object rather than a collection of free functions, perhaps to support integer color models—I’d likely have to wait 500 days, or more, for a review.” # 15th June 2019, 4 pm

When should you be using Web Workers? 85% of worldwide mobile devices are massively less performant than high end iPhones. Surma argues that we should be making aggressive use of Web Workers to keep as much of our JavaScript as possible off the main UI thread, to avoid freezing up the entire interface. # 15th June 2019, 4:31 am

Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database. # 14th June 2019, 12:45 am

paginate-json (via) I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils. # 12th June 2019, 3:22 pm

Serverless Microservice Patterns for AWS (via) A handy collection of 19 architectural patterns for AWS Lambda collected by Jeremy Daly. # 12th June 2019, 12:13 am

datasette-render-binary (via) Yet another tiny Datasette plugin. This one attempts to render binary data in a slightly more readable fashion—it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format. # 9th June 2019, 4:22 pm

datasette-bplist (via) It turns out an OS X laptop is positively crammed with SQLite databases, and many of them contain values that are data structures encoded using Apple’s binary plist format. datasette-bplist is my new plugin to help explore those files: it provides a display hook for rendering their contents, and a custom bplist_to_json() SQL function which can be used to extract and query information that is embedded in those values. The README includes tips on how to pull interesting EXIF data out of the SQLite database that sits behind Apple Photos. # 9th June 2019, 1:26 am