Simon Willison’s Weblog

Subscribe

Items in Mar, 2021

Filters: Year: 2021 × Month: Mar × Sorted by date


This teaches us that—when it’s a big enough deal—Amazon will lie to us. And coming from the company that runs the production infrastructure for our companies, stores our data, and has been granted an outsized position of trust based upon having earned it over 15 years, this is a nightmare.

Corey Quinn # 31st March 2021, 4:47 pm

ifconfig.co (via) I really like this: “curl ifconfig.co” gives you your IP address as plain text, “curl ifconfig.co/city” tells you your city according to MaxMind GeoLite2, “curl ifconfig.co/json” gives you all sorts of useful extra data. Suggested rate limit is one per minute, but the code is open source Go that you can run yourself. # 30th March 2021, 7:53 pm

Hello, HPy (via) HPy provides a new way to write C extensions for Python in a way that is compatible with multiple Python implementations at once, including PyPy. # 29th March 2021, 2:40 pm

Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history

This week I got SpatiaLite 5 working in the Datasette Docker image, improved the CDC vaccination history git scraper, figured out Datasette on Azure and we closed on a new home!

[... 986 words]

sqlite-plus (via) Anton Zhiyanov bundled together a bunch of useful SQLite C extensions for things like statistical functions, unicode string normalization and handling CSV files as virtual tables. The GitHub Actions workflow here is a particularly useful example of compiling SQLite extensions for three different platforms. # 25th March 2021, 9:13 pm

Why all my servers have an 8GB empty file (via) This is such a good idea: keep a 8GB empty spacer.img file on any servers you operate, purely so that if the server runs out of space you can delete the file and get some breathing room for getting everything else working again. I’ve had servers run out of space in the past and it’s an absolute pain to sort out—this trick would have really helped. # 25th March 2021, 8:35 pm

Homebrew Python Is Not For You. If you’ve been running into frustrations with your Homebrew Python environments breaking over the past few months (the dreaded “Reason: image not found” error) Justin Mayer has a good explanation. Python in a Homebrew is designed to work as a dependency for their other packages, and recent policy changes that they made to support smoother upgrades have had catastrophic problems effects on those of us who try to use it for development environments. # 25th March 2021, 3:14 pm

Understanding JSON Schema (via) Useful, comprehensive short book guide to JSON Schema, which finally helped me feel like I fully understand the specification. # 24th March 2021, 2:57 am

The Airtable formulas at the heart of everything

While working on building a Counties.json API endpoint for VIAL I realized I wasn’t entirely sure how the “Total reports” and “Yeses” numbers in this piece of JSON were calculated:

[... 323 words]

A Complete Guide To Accessible Front-End Components. I’m so excited about this article: it brings together an absolute wealth of resources on accessible front-end components, including many existing component implementations that are accessible out of the box. Date pickers, autocomplete widgets, modals, menus—all sorts of things that I’ve been dragging my heels on implementing because I didn’t fully understand their accessibility implications. # 23rd March 2021, 1:06 am

The Accountability Project Datasettes. The Accountability Project “curates, standardizes and indexes public data to give journalists, researchers and others a simple way to search across otherwise siloed records”—they have a wide range of useful data, and they’ve started experimenting with Datasette to provide SQL access to a subset of the information that they have collected. # 22nd March 2021, 12:07 am

Weeknotes: django-sql-dashboard widgets

A few small releases this week, for django-sql-dashboard, datasette-auth-passwords and datasette-publish-vercel.

[... 1025 words]

GitHub, by default, writes five replicas of each repository across our three data centers to protect against failures at the server, rack, network, and data center levels. When we need to update Git references, we briefly take a lock across all of the replicas in all of our data centers, and release the lock when our three-phase-commit (3PC) protocol reports success.

Scott Arbeit # 21st March 2021, 12:57 am

How we found and fixed a rare race condition in our session handling. GitHub had a terrifying bug this month where a user reported suddenly being signed in as another user. This is a particularly great example of a security incident report, explaining how GitHub identified the underlying bug, what caused it and the steps they are taking to ensure bugs like that never happen in the future. The root cause was a convoluted sequence of events which could cause a Ruby Hash to be accidentally shared between two requests, caused as a result of a new background thread that was introduced as a performance optimization. # 18th March 2021, 11:06 pm

When you have to mock a collaborator, avoid using the Mock object directly. Either use mock.create_autospec() or mock.patch(autospec=True) if at all possible. Autospeccing from the real collaborator means that if the collaborator’s interface changes, your tests will fail. Manually speccing or not speccing at all means that changes in the collaborator’s interface will not break your tests that use the collaborator: you could have 100% test coverage and your library would fall over when used!

Thea Flowers # 17th March 2021, 4:44 pm

logpaste (via) Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket. # 17th March 2021, 3:48 pm

VIAL is now live, plus django-sql-dashboard

Our new Django backend has now officially graduated from preview mode! We’ve been running it to collect caller reports for Oregon for over a week now, and today we finally turned off the old Heroku app and promoted https://vial.calltheshots.us/ to be the place that our caller app writes to.

[... 672 words]

sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions. # 15th March 2021, 2:55 am

sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection. # 15th March 2021, 2:52 am

Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

[... 792 words]

APIs for importing locations

An important aspect of the new backend is the ability to import new locations.

[... 254 words]

New call queue ready to test. Also geography.

I just shipped the first working preview version of the new /api/requestCall API [#54]—the API that our caller app uses to get the next location that the caller should be contacting (and lock it so that other users don’t call it at the same time).

[... 697 words]

Weeknotes: Datasette and Git scraping at NICAR, VaccinateCA

This week I virtually attended the NICAR data journalism conference and made a ton of progress on the Django backend for VaccinateCA (see last week).

[... 773 words]

The simplest possible call queue

Today I’ve been working on the queue calling mechanism for the new Django backend.

[... 471 words]

The SOC2 Starting Seven (via) "So, you plan to sell your startup’s product to big companies one day. Congratu-dolences! [...] Here’s how we’ll try to help: with Seven Things you can do now that will simplify SOC2 for you down the road while making your life, or at least your security posture, materially better in the immediacy. # 5th March 2021, 7:50 pm

Git scraping, the five minute lightning talk

I prepared a lightning talk about Git scraping for the NICAR 2021 data journalism conference. In the talk I explain the idea of running scheduled scrapers in GitHub Actions, show some examples and then live code a new scraper for the CDC’s vaccination data using the GitHub web interface. Here’s the video.

[... 289 words]

google-cloud-4-words. This is really useful: every Google Cloud service (all 250 of them) with a four word description explaining what it does. I’d love to see the same thing for AWS. UPDATE: Turns out I had—I can’t link to other posts from blogmark descriptions yet, so search “aws explained” on this site to find it. # 4th March 2021, 12:40 am

Replaying logs to exercise the new API

22 days ago n1mmy pushed a change to help.vaccinate which logged full details of inoming Netlify function API traffic to an Airtable database.

[... 542 words]

API ready for testing, first video status update

To celebrate GitHub’s new support for drag-and-dropping mp4 files into Markdown, I decided to switch things around a little and include a video update with today’s blog entry.

[... 656 words]

How I cut GTA Online loading times by 70% (via) Incredible debugging war story: t0st was fed up of waiting six minutes (!) for GTA Online to load on their PC, so they used a host of devious debugging tricks to try and figure out what was going on. It turned out the game was loading a 10MB JSON file detailing all of the available in-game purchases, but inefficient JSON parsing meant it was pegging an entire CPU for 4 minutes mainly running the strlen() C function. Despite not having access to source code or debugging symbols t0st figured out the problem and managed to inject a custom DLL that hooked some internal functions and dropped load times down from 6m down to to 1m50s! # 1st March 2021, 7:12 pm