Simon Willison’s Weblog

Subscribe

May 2019

May 1, 2019

JSK Journalism Fellowships names Class of 2019-2020 (and I’m in it!) (via) In personal news... I’ve been accepted for a ten month journalism fellowship at Stanford (starting September)! My work there will involve “Improving the impact of investigative stories by expanding the open-source ecosystem of tools that allows journalists to share the underlying data”.

# 4:43 pm / stanford, personal, journalism, datasette, jsk

A Conspiracy To Kill IE6 (via) Cracking story by Chris Zacharias about how a team of engineers at YouTube back in 2009 took advantage of some exploits in YouTube’s organization structure (left over from their acquisition by Google) to ship a vague IE6 deprecation warning banner on one of the world’s highest traffic websites, inspiring many other similar banners and resulting in a 10% drop in global IE6 traffic.

# 8:26 pm / youtube, ie6

May 2, 2019

Want to see what one digital future for newspapers looks like? Look at The Guardian, which isn’t losing money anymore (via) After losing money every single year since 1998, the Guardian just managed to turn a profit! Detailed analysis of how they did it by Joshua Benton.

# 5:49 am / guardian, newspapers

May 7, 2019

asgi-cors (via) I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist).

# 12:12 am / projects, asgi, security, cors

May 8, 2019

We don't like limits on discrimination and lending, so we're gonna use machine learning, which is a form of money laundering for bias, a way to blame mathematical algorithms for desires to simply avoid rules that everybody else has to play by in this industry.

Maciej Ceglowski

# 11:11 pm / machine-learning, maciejceglowski

May 10, 2019

... the overall conclusion I reach is that we have so much to gain from making Django async-capable that it is worth the large amount of work it will take. I also believe, crucially, that we can undertake this change in an iterative, community-driven way that does not rely solely on one or two long-time contributors burning themselves out.

Andrew Godwin

# 2 am / async, andrew-godwin, django

May 14, 2019

Amazon’s Away Teams laid bare: How AWS’s hivemind of engineers develop and maintain their internal tech (via) Some interesting insights into how Amazon structure their engineering organization to maximize team productivity in a service-oriented environment. Two things that stood out to me: each service is owned by a “home team”, but sometimes features that are needed by other teams can be built by forming an “away team” to build out that functionality. Secondly, Amazon has a concept of “bar raisers” who are engineers across the organization who help approve key design and architectural decisions. It’s possible to go against the recommendation of a bar raiser but “such a move is noted and made visible to higher levels of management”.

# 6:32 pm / amazon, serviceorientedarchitecture, management

quicktype code generator for Python. Really interesting tool: give it an example JSON document and it will code-generate the equivalent set of Python classes (with type annotations) instantly in your browser. It also accepts input in JSON Schema or TypeScript and can generate code in 18 different languages.

# 11:35 pm / typescript, python, json

May 15, 2019

Imagine if you were really into the group Swervedriver in the mid-’90s but by 2019 someone was on CNBC telling you that Swervedriver represented, I don’t know, 10 percent of global economic growth, outpacing returns in oil and lumber. That’s the tech industry.

Paul Ford

# 3:44 pm / paul-ford

Why I (Still) Love Tech: In Defense of a Difficult Industry (via) If you only read one longform piece this week, make it this one. Utterly delightful prose and a bunch of different messages that resonated with me deeply.

# 3:45 pm / paul-ford

django-lifecycle (via) Interesting alternative to Django signals by Robert Singer. It provides a model mixin class which over-rides the Django ORM’s save() method, tracking which model attributes have been changed. Then it lets you add methods to your model with a @hook annotation allowing you to specify things like “run this method before saving if the status changed” or “run this after an object has been deleted”.

# 11:34 pm / django

May 19, 2019

Datasette 0.28—and why master should always be releasable

It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.

[... 1,326 words]

May 21, 2019

Discover Insights in Developer Survey Results. Stack Overflow partnered with Glitch and used Datasette to host the full data set from Stack Overflow’s 2019 Developer Survey!

# 6:50 pm / glitch, stackoverflow, datasette, surveys

Public Data Release of Stack Overflow’s 2019 Developer Survey. Here’s the Stack Overflow announcement of their developer survey public data release, which discusses the Glitch partnership and mentions Datasette.

# 6:51 pm / glitch, stackoverflow, datasette, surveys

Monaco Editor. VS Code is MIT licensed and built on top of Electron. I thought “huh, I wonder if I could run the editor component embedded in a web app”—and it turns out Microsoft have already extracted out the code editor component into an open source JavaScript package called Monaco. Looks very slick, though sadly it’s not supported in mobile browsers.

# 8:47 pm / editor, open-source, microsoft, javascript, electron, vs-code

Terrarium by Fastly Labs. Fastly have been investing heavily in WebAssembly, which makes sense as it provides an excellent option for a sandboxed environment for executing server-side code at the edge of their CDN offering. Terrarium is their “playground for experimenting with edge-side WebAssembly”—it lets you write a program in Rust, C, TypeScript or Wat (WebAssembly text format), compile it to WebAssembly and deploy it to a URL with a single button-click. It’s just a demo for the moment so deployments only persist for 15 minutes, but it’s a fascinating sandbox to play around with.

# 8:51 pm / rust, webassembly, fastly

May 22, 2019

WebAssembly at eBay: A Real-World Use Case (via) eBay used WebAssembly to run a C++ barcode reading library inside a web worker, passing images from the camera in order to provide a barcode scanning interface as part of their mobile web “add listing” page (a feature that had already proved successful in their native mobile apps). This is a great write-up, with lots of detail about how they compiled the library. They ended up running three barcode solutions in parallel web workers—two using WebAssembly, one in pure JavaScript—because their testing showed that racing between three implementations greatly increased the chance of a match due to how the different libraries handled poor quality or out-of-focus images.

# 8:30 pm / webassembly, webworkers

May 25, 2019

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change.

# 1:20 am / projects, versioning, sqlite, sqlite-utils, semver

May 27, 2019

Using dependabot to bump Django on my blog from 2.2 to 2.2.1 (via) GitHub recently acquired dependabot and made it free, and I decided to try it out on my blog. It’s a really neat piece of automation: it scans your requirements.txt (plus a number of other packaging definitions across several different languages), checks for updates to your dependencies and opens pull requests against any that it finds. Combine it with a CI service such as Circle CI and your tests will run automatically against the pull request, letting you know if it’s safe to merge. dependabot constantly rebases other changes against the pull request to try and ensure it will merge as cleanly as possible.

# 1:24 am / django, github

May 28, 2019

Zdog (via) Well this is absolutely delightful: Zdog is a pseudo-3D engine for canvas and SVG that outputs 3D models rendered as super-stylish flat shapes. It’s hard to describe with words—go play with the demos!

# 9:59 pm / 3d, canvas

gls: Goroutine local storage (via) Go doesn’t provide a mechanism for having “goroutine local” variables (like threadlocals in Python but for goroutines), and the structure of the language makes it really hard to get something working. JT Olio figured out a truly legendary hack: Go’s introspection lets you see the current stack, so he figured out a way to encode a base-16 identifer tag into the call order of 16 special nested functions. I particularly like the “What are people saying?” section of the README: “Wow, that’s horrifying.”—“This is the most terrible thing I have seen in a very long time.”—“Where is it getting a context from? Is this serializing all the requests? What the heck is the client being bound to? What are these tags? Why does he need callers? Oh god no. No no no.”

# 11:13 pm / go, hacks

May 29, 2019

Falsehoods Programmers Believe About Search (via) These are great. “When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon”.

# 8:09 pm / search

May 30, 2019

datasette-jq (via) I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo—it’s a neat way to play with the jq micro-language.

# 1:52 am / projects, datasette, jq

Building a stateless API proxy (via) This is a really clever idea. The GitHub API is infuriatingly coarsely grained with its permissions: you often end up having to create a token with way more permissions than you actually need for your project. Thea Flowers proposes running your own proxy in front of their API that adds more finely grained permissions, based on custom encrypted proxy API tokens that use JWT to encode the original API key along with the permissions you want to grant to that particular token (as a list of regular expressions matching paths on the underlying API).

# 4:28 am / encryption, proxy, security, apis, github, jwt

Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file.

# 4:35 am / jupyter, data-journalism, latimes, pandas, gis, ben-welsh

Practical campaign security is a wood chipper for your hopes and dreams. It sits at the intersection of 19 kinds of status quo, each more odious than the last. You have to accept the fact that computers are broken, software is terrible, campaign finance is evil, the political parties are inept, the DCCC exists, politics is full of parasites, tech companies are run by arrogant man-children, and so on.

Maciej Cegłowski

# 12:03 pm / maciejceglowski, politics, security

2019 » May

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
2728293031