Simon Willison’s Weblog

Subscribe

Items in Apr, 2021

Filters: Year: 2021 × Month: Apr × Sorted by date


A CSV export, JSON import workflow for bulk updating our data

I just added missing counties to around 1200 of our locations using a combination of tricks, and I thought they’d make a good blog post.

[... 1429 words]

Over the past several months, everyone in the industry who provides any kind of free CPU resources has been dealing with a massive outbreak of abuse for cryptocurrency mining. The industry has been setting up informal working groups to pool knowledge of mitigations, communicate when our platforms are being leveraged against one another, and cumulatively wasting thousands of hours of engineering time implementing measures to deal with this abuse, and responding as attackers find new ways to circumvent them.

Drew DeVault, SourceHut # 26th April 2021, 11:52 pm

Weeknotes: Vaccinate The States, and how I learned that returning dozens of MB of JSON works just fine these days

On Friday VaccinateCA grew in scope, a lot: we launched a new website called Vaccinate The States. Patrick McKenzie wrote more about the project here—the short version is that we’re building the most comprehensive possible dataset of vaccine availability in the USA, using a combination of data collation, online research and continuing to make a huge number of phone calls.

[... 1109 words]

Weeknotes: The Aftermath

Some tweets that effectively illustrate my week:

[... 208 words]

country-coder (via) Given a latitude and longitude, how can you tell what country that point sits within? One way is to do a point-in-polygon lookup against a set of country polygons, but this can be tricky: some countries such as New Zealand have extremely complex outlines, even though for this use-case you don’t need the exact shape of the coastline. country-coder solves this with a custom designed 595KB GeoJSON file with detailed land borders but loosely defined ocean borders. It also comes with a wrapper JavaScript library that provides an API for resolving points, plus useful properties on each country with details like telepohen calling codes and emoji flags. # 18th April 2021, 7:37 pm

Why you shouldn’t use ENV variables for secret data (via) I do this all the time, but this article provides a good set of reasons that secrets in environment variables are a bad pattern—even when you know there’s no multi-user access to the host you are deploying to. The biggest problem is that they often get captured by error handling scripts, which may not have the right code in place to redact them. This article suggests using Docker secrets instead, but I’d love to see a comprehensive write-up of other recommended patterns for this that go beyond applications running in Docker. # 14th April 2021, 6:22 pm

Porting VaccinateCA to Django

As I mentioned back in February, I’ve been working with the VaccinateCA project to try to bring the pandemic to an end a little earlier by helping gather as accurate a model as possible of where the Covid vaccine is available in California and how people can get it.

[... 2157 words]

In general, relying only on natural keys is a nightmare. Double nightmare if it’s PII. Natural keys only work if you are flawlessly omniscient about the domain. And you aren’t.

Jacques Chester # 11th April 2021, 10:48 pm

Behind GitHub’s new authentication token formats (via) This is a really smart design. GitHub’s new tokens use a type prefix of “ghp_” or “gho_” or a few others depending on the type of token, to help support mechanisms that scan for accidental token publication. A further twist is that the last six characters of the tokens are a checksum, which means token scanners can reliably distinguish a real token from a coincidental string without needing to check back with the GitHub database. “One other neat thing about _ is it will reliably select the whole token when you double click on it”—what a useful detail! # 5th April 2021, 9:28 pm

Render single selected county on a map (via) Another experiment at the intersection of Datasette and Observable notebooks. This one imports a full Datasette table (3,200 US counties) using streaming CSV and loads that into Observable’s new Search and Table filter widgets. Once you select a single county a second Datasette SQL query (this time retuning JSON) fetches a GeoJSON representation of that county which is then rendered as SVG using D3. # 5th April 2021, 4:48 am

Spatialite Speed Test. Part of an excellent series of posts about SpatiaLite from 2012—here John C. Zastrow reports on running polygon intersection queries against a 1.9GB database file in 40 seconds without an index and 0.186 seconds using the SpatialIndex virtual table mechanism. # 4th April 2021, 4:28 pm

Animated choropleth of vaccinations by US county

Last week I mentioned that I’ve recently started scraping and storing the CDC’s per-county vaccination numbers in my cdc-vaccination-history GitHub repository. This week I used an Observable notebook and d3’s TopoJSON support to render those numbers on an animated choropleth map.

[... 1138 words]

If you measure things by foot traffic we [the SFO Museum] are one of the busiest museums in the world. If that is the case we are also one of the busiest museums in the world that no one knows about. Nothing in modern life really prepares you for the idea that a museum should be part of an airport. San Francisco, as I’ve mentioned, is funny that way.

Aaron Straup Cope # 1st April 2021, 10:40 pm

VIAL: Preparing for some collaborative testing

With the Airtable limits fast approaching, I’m going to start leaning heavily on people to help verify that VIAL can do the jobs that it needs to do.

[... 465 words]