Simon Willison’s Weblog

Subscribe

March 2021

March 1, 2021

Drawing the rest of the owl

My ambitious goal for the day was to get the new Django/PostgreSQL preview into a state where we could start sending example API requests to it from the help.vaccinate app.

[... 371 words]

How I cut GTA Online loading times by 70% (via) Incredible debugging war story: t0st was fed up of waiting six minutes (!) for GTA Online to load on their PC, so they used a host of devious debugging tricks to try and figure out what was going on. It turned out the game was loading a 10MB JSON file detailing all of the available in-game purchases, but inefficient JSON parsing meant it was pegging an entire CPU for 4 minutes mainly running the strlen() C function. Despite not having access to source code or debugging symbols t0st figured out the problem and managed to inject a custom DLL that hooked some internal functions and dropped load times down from 6m down to to 1m50s!

# 7:12 pm / profiling, hacking, debugger

March 2, 2021

API ready for testing, first video status update

To celebrate GitHub’s new support for drag-and-dropping mp4 files into Markdown, I decided to switch things around a little and include a video update with today’s blog entry.

[... 656 words]

March 3, 2021

Replaying logs to exercise the new API

22 days ago n1mmy pushed a change to help.vaccinate which logged full details of inoming Netlify function API traffic to an Airtable database.

[... 542 words]

March 4, 2021

google-cloud-4-words. This is really useful: every Google Cloud service (all 250 of them) with a four word description explaining what it does. I’d love to see the same thing for AWS. UPDATE: Turns out I had—I can’t link to other posts from blogmark descriptions yet, so search “aws explained” on this site to find it.

# 12:40 am / google

March 5, 2021

Git scraping, the five minute lightning talk

Visit Git scraping, the five minute lightning talk

I prepared a lightning talk about Git scraping for the NICAR 2021 data journalism conference. In the talk I explain the idea of running scheduled scrapers in GitHub Actions, show some examples and then live code a new scraper for the CDC’s vaccination data using the GitHub web interface. Here’s the video.

[... 289 words]

The SOC2 Starting Seven (via) "So, you plan to sell your startup’s product to big companies one day. Congratu-dolences! [...] Here’s how we’ll try to help: with Seven Things you can do now that will simplify SOC2 for you down the road while making your life, or at least your security posture, materially better in the immediacy.

# 7:50 pm / security, startups

March 6, 2021

The simplest possible call queue

Today I’ve been working on the queue calling mechanism for the new Django backend.

[... 471 words]

March 7, 2021

Weeknotes: Datasette and Git scraping at NICAR, VaccinateCA

This week I virtually attended the NICAR data journalism conference and made a ton of progress on the Django backend for VaccinateCA (see last week).

[... 773 words]

New call queue ready to test. Also geography.

I just shipped the first working preview version of the new /api/requestCall API [#54]—the API that our caller app uses to get the next location that the caller should be contacting (and lock it so that other users don’t call it at the same time).

[... 697 words]

March 9, 2021

APIs for importing locations

An important aspect of the new backend is the ability to import new locations.

[... 254 words]

March 14, 2021

Weeknotes: tableau-to-sqlite, django-sql-dashboard

Visit Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

[... 792 words]

March 15, 2021

sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection.

# 2:52 am / sqlite, python

sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions.

# 2:55 am / sqlite, python

VIAL is now live, plus django-sql-dashboard

Our new Django backend has now officially graduated from preview mode! We’ve been running it to collect caller reports for Oregon for over a week now, and today we finally turned off the old Heroku app and promoted https://vial.calltheshots.us/ to be the place that our caller app writes to.

[... 672 words]

March 17, 2021

logpaste (via) Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket.

# 3:48 pm / docker, litestream, sqlite, s3, replication

When you have to mock a collaborator, avoid using the Mock object directly. Either use mock.create_autospec() or mock.patch(autospec=True) if at all possible. Autospeccing from the real collaborator means that if the collaborator's interface changes, your tests will fail. Manually speccing or not speccing at all means that changes in the collaborator's interface will not break your tests that use the collaborator: you could have 100% test coverage and your library would fall over when used!

Thea Flowers

# 4:44 pm / mocking, testing, python

March 18, 2021

How we found and fixed a rare race condition in our session handling. GitHub had a terrifying bug this month where a user reported suddenly being signed in as another user. This is a particularly great example of a security incident report, explaining how GitHub identified the underlying bug, what caused it and the steps they are taking to ensure bugs like that never happen in the future. The root cause was a convoluted sequence of events which could cause a Ruby Hash to be accidentally shared between two requests, caused as a result of a new background thread that was introduced as a performance optimization.

# 11:06 pm / threads, security, github

March 21, 2021

GitHub, by default, writes five replicas of each repository across our three data centers to protect against failures at the server, rack, network, and data center levels. When we need to update Git references, we briefly take a lock across all of the replicas in all of our data centers, and release the lock when our three-phase-commit (3PC) protocol reports success.

Scott Arbeit

# 12:57 am / github

Weeknotes: django-sql-dashboard widgets

Visit Weeknotes: django-sql-dashboard widgets

A few small releases this week, for django-sql-dashboard, datasette-auth-passwords and datasette-publish-vercel.

[... 1,025 words]

March 22, 2021

The Accountability Project Datasettes. The Accountability Project “curates, standardizes and indexes public data to give journalists, researchers and others a simple way to search across otherwise siloed records”—they have a wide range of useful data, and they’ve started experimenting with Datasette to provide SQL access to a subset of the information that they have collected.

# 12:07 am / data-journalism, datasette

March 23, 2021

A Complete Guide To Accessible Front-End Components. I’m so excited about this article: it brings together an absolute wealth of resources on accessible front-end components, including many existing component implementations that are accessible out of the box. Date pickers, autocomplete widgets, modals, menus—all sorts of things that I’ve been dragging my heels on implementing because I didn’t fully understand their accessibility implications.

# 1:06 am / accessibility, javascript

The Airtable formulas at the heart of everything

While working on building a Counties.json API endpoint for VIAL I realized I wasn’t entirely sure how the “Total reports” and “Yeses” numbers in this piece of JSON were calculated:

[... 323 words]

March 24, 2021

Understanding JSON Schema (via) Useful, comprehensive short book guide to JSON Schema, which finally helped me feel like I fully understand the specification.

# 2:57 am / jsonschema, json

March 25, 2021

Homebrew Python Is Not For You. If you’ve been running into frustrations with your Homebrew Python environments breaking over the past few months (the dreaded “Reason: image not found” error) Justin Mayer has a good explanation. Python in a Homebrew is designed to work as a dependency for their other packages, and recent policy changes that they made to support smoother upgrades have had catastrophic problems effects on those of us who try to use it for development environments.

# 3:14 pm / homebrew, python

Why all my servers have an 8GB empty file (via) This is such a good idea: keep a 8GB empty spacer.img file on any servers you operate, purely so that if the server runs out of space you can delete the file and get some breathing room for getting everything else working again. I’ve had servers run out of space in the past and it’s an absolute pain to sort out—this trick would have really helped.

# 8:35 pm / sysadmin

sqlite-plus (via) Anton Zhiyanov bundled together a bunch of useful SQLite C extensions for things like statistical functions, unicode string normalization and handling CSV files as virtual tables. The GitHub Actions workflow here is a particularly useful example of compiling SQLite extensions for three different platforms.

# 9:13 pm / github-actions, sqlite, anton-zhiyanov

March 28, 2021

Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history

Visit Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history

This week I got SpatiaLite 5 working in the Datasette Docker image, improved the CDC vaccination history git scraper, figured out Datasette on Azure and we closed on a new home!

[... 986 words]

March 29, 2021

Hello, HPy (via) HPy provides a new way to write C extensions for Python in a way that is compatible with multiple Python implementations at once, including PyPy.

# 2:40 pm / python, pypy

March 30, 2021

ifconfig.co (via) I really like this: “curl ifconfig.co” gives you your IP address as plain text, “curl ifconfig.co/city” tells you your city according to MaxMind GeoLite2, “curl ifconfig.co/json” gives you all sorts of useful extra data. Suggested rate limit is one per minute, but the code is open source Go that you can run yourself.

# 7:53 pm / ip, curl, networking

2021 » March

MTWTFSS
1234567
891011121314
15161718192021
22232425262728
293031