Simon Willison’s Weblog

Items in May

Filters: Month: May ×


Practical campaign security is a wood chipper for your hopes and dreams. It sits at the intersection of 19 kinds of status quo, each more odious than the last. You have to accept the fact that computers are broken, software is terrible, campaign finance is evil, the political parties are inept, the DCCC exists, politics is full of parasites, tech companies are run by arrogant man-children, and so on.

Maciej Cegłowski # 30th May 2019, 12:03 pm

Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file. # 30th May 2019, 4:35 am

Building a stateless API proxy (via) This is a really clever idea. The GitHub API is infuriatingly coarsely grained with its permissions: you often end up having to create a token with way more permissions than you actually need for your project. Thea Flowers proposes running your own proxy in front of their API that adds more finely grained permissions, based on custom encrypted proxy API tokens that use JWT to encode the original API key along with the permissions you want to grant to that particular token (as a list of regular expressions matching paths on the underlying API). # 30th May 2019, 4:28 am

datasette-jq (via) I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo—it’s a neat way to play with the jq micro-language. # 30th May 2019, 1:52 am

Falsehoods Programmers Believe About Search (via) These are great. “When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon”. # 29th May 2019, 8:09 pm

gls: Goroutine local storage (via) Go doesn’t provide a mechanism for having “goroutine local” variables (like threadlocals in Python but for goroutines), and the structure of the language makes it really hard to get something working. JT Olio figured out a truly legendary hack: Go’s introspection lets you see the current stack, so he figured out a way to encode a base-16 identifer tag into the call order of 16 special nested functions. I particularly like the “What are people saying?” section of the README: “Wow, that’s horrifying.”—“This is the most terrible thing I have seen in a very long time.”—“Where is it getting a context from? Is this serializing all the requests? What the heck is the client being bound to? What are these tags? Why does he need callers? Oh god no. No no no.” # 28th May 2019, 11:13 pm

Zdog (via) Well this is absolutely delightful: Zdog is a pseudo-3D engine for canvas and SVG that outputs 3D models rendered as super-stylish flat shapes. It’s hard to describe with words—go play with the demos! # 28th May 2019, 9:59 pm

Using dependabot to bump Django on my blog from 2.2 to 2.2.1 (via) GitHub recently acquired dependabot and made it free, and I decided to try it out on my blog. It’s a really neat piece of automation: it scans your requirements.txt (plus a number of other packaging definitions across several different languages), checks for updates to your dependencies and opens pull requests against any that it finds. Combine it with a CI service such as Circle CI and your tests will run automatically against the pull request, letting you know if it’s safe to merge. dependabot constantly rebases other changes against the pull request to try and ensure it will merge as cleanly as possible. # 27th May 2019, 1:24 am

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change. # 25th May 2019, 1:20 am

WebAssembly at eBay: A Real-World Use Case (via) eBay used WebAssembly to run a C++ barcode reading library inside a web worker, passing images from the camera in order to provide a barcode scanning interface as part of their mobile web “add listing” page (a feature that had already proved successful in their native mobile apps). This is a great write-up, with lots of detail about how they compiled the library. They ended up running three barcode solutions in parallel web workers—two using WebAssembly, one in pure JavaScript—because their testing showed that racing between three implementations greatly increased the chance of a match due to how the different libraries handled poor quality or out-of-focus images. # 22nd May 2019, 8:30 pm

Terrarium by Fastly Labs. Fastly have been investing heavily in WebAssembly, which makes sense as it provides an excellent option for a sandboxed environment for executing server-side code at the edge of their CDN offering. Terrarium is their “playground for experimenting with edge-side WebAssembly”—it lets you write a program in Rust, C, TypeScript or Wat (WebAssembly text format), compile it to WebAssembly and deploy it to a URL with a single button-click. It’s just a demo for the moment so deployments only persist for 15 minutes, but it’s a fascinating sandbox to play around with. # 21st May 2019, 8:51 pm

Monaco Editor. VS Code is MIT licensed and built on top of Electron. I thought “huh, I wonder if I could run the editor component embedded in a web app”—and it turns out Microsoft have already extracted out the code editor component into an open source JavaScript package called Monaco. Looks very slick, though sadly it’s not supported in mobile browsers. # 21st May 2019, 8:47 pm

Public Data Release of Stack Overflow’s 2019 Developer Survey. Here’s the Stack Overflow announcement of their developer survey public data release, which discusses the Glitch partnership and mentions Datasette. # 21st May 2019, 6:51 pm

Discover Insights in Developer Survey Results. Stack Overflow partnered with Glitch and used Datasette to host the full data set from Stack Overflow’s 2019 Developer Survey! # 21st May 2019, 6:50 pm

Datasette 0.28—and why master should always be releasable

It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.

[... 1326 words]

django-lifecycle (via) Interesting alternative to Django signals by Robert Singer. It provides a model mixin class which over-rides the Django ORM’s save() method, tracking which model attributes have been changed. Then it lets you add methods to your model with a @hook annotation allowing you to specify things like “run this method before saving if the status changed” or “run this after an object has been deleted”. # 15th May 2019, 11:34 pm

Why I (Still) Love Tech: In Defense of a Difficult Industry (via) If you only read one longform piece this week, make it this one. Utterly delightful prose and a bunch of different messages that resonated with me deeply. # 15th May 2019, 3:45 pm

Imagine if you were really into the group Swervedriver in the mid-’90s but by 2019 someone was on CNBC telling you that Swervedriver represented, I don’t know, 10 percent of global economic growth, outpacing returns in oil and lumber. That’s the tech industry.

Paul Ford # 15th May 2019, 3:44 pm

quicktype code generator for Python. Really interesting tool: give it an example JSON document and it will code-generate the equivalent set of Python classes (with type annotations) instantly in your browser. It also accepts input in JSON Schema or TypeScript and can generate code in 18 different languages. # 14th May 2019, 11:35 pm

Amazon’s Away Teams laid bare: How AWS’s hivemind of engineers develop and maintain their internal tech (via) Some interesting insights into how Amazon structure their engineering organization to maximize team productivity in a service-oriented environment. Two things that stood out to me: each service is owned by a “home team”, but sometimes features that are needed by other teams can be built by forming an “away team” to build out that functionality. Secondly, Amazon has a concept of “bar raisers” who are engineers across the organization who help approve key design and architectural decisions. It’s possible to go against the recommendation of a bar raiser but “such a move is noted and made visible to higher levels of management”. # 14th May 2019, 6:32 pm

... the overall conclusion I reach is that we have so much to gain from making Django async-capable that it is worth the large amount of work it will take. I also believe, crucially, that we can undertake this change in an iterative, community-driven way that does not rely solely on one or two long-time contributors burning themselves out.

Andrew Godwin # 10th May 2019, 2 am

We don’t like limits on discrimination and lending, so we’re gonna use machine learning, which is a form of money laundering for bias, a way to blame mathematical algorithms for desires to simply avoid rules that everybody else has to play by in this industry.

Maciej Ceglowski # 8th May 2019, 11:11 pm

asgi-cors (via) I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist). # 7th May 2019, 12:12 am

Want to see what one digital future for newspapers looks like? Look at The Guardian, which isn’t losing money anymore (via) After losing money every single year since 1998, the Guardian just managed to turn a profit! Detailed analysis of how they did it by Joshua Benton. # 2nd May 2019, 5:49 am

A Conspiracy To Kill IE6 (via) Cracking story by Chris Zacharias about how a team of engineers at YouTube back in 2009 took advantage of some exploits in YouTube’s organization structure (left over from their acquisition by Google) to ship a vague IE6 deprecation warning banner on one of the world’s highest traffic websites, inspiring many other similar banners and resulting in a 10% drop in global IE6 traffic. # 1st May 2019, 8:26 pm

JSK Journalism Fellowships names Class of 2019-2020 (and I’m in it!) (via) In personal news... I’ve been accepted for a ten month journalism fellowship at Stanford (starting September)! My work there will involve “Improving the impact of investigative stories by expanding the open-source ecosystem of tools that allows journalists to share the underlying data”. # 1st May 2019, 4:43 pm

SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it. # 30th May 2018, 4:34 am

Library of Congress Sustainability of Digital Formats: SQLite. “The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets.” # 28th May 2018, 5:19 pm

In one case this winter, miners from China landed their private jet at the local airport, drove a rental car to the visitor center at the Rocky Reach Dam, just north of Wenatchee, and, according to Chelan County PUD officials, politely asked to see the “dam master because we want to buy some electricity.”

Paul Roberts, Seattle Times # 27th May 2018, 4:16 pm

A traditional centralized database only needs to be written to once. A blockchain needs to be written to thousands of times. A traditional centralized database needs to only checks the data once. A blockchain needs to check the data thousands of times. A traditional centralized database needs to transmit the data for storage only once. A blockchain needs to transmit the data thousands of times. The costs of maintaining a blockchain are orders of magnitude higher and the cost needs to be justified by utility. Most applications looking for some of the properties stated earlier like consistency and reliability can get such things for a whole lot cheaper utilizing integrity checks, receipts and backups.

Jimmy Song # 24th May 2018, 2:44 pm