Simon Willison’s Weblog

Django admin customization, JSON in our PostgreSQL

Originally posted to my internal blog at VaccinateCA

My progress slowed a bit today as I started digging into some things I’m less familiar with—but I’ve found some tricks that I think will help us out a lot.

Django admin customization

In the past I’ve used the Django admin mostly as a database debugging tool, on the basis that once you get deep into building out an interface that’s more than just a very basic CMS you’re better off rolling something from scratch.

Today my opinion changed. I think the Django admin may be the solution to a LOT of our problems, with very little extra customization.

I was looking at a simple feature request: show a summary of calls made by a specific reporter. It turns out adding a custom templated block of text to an existing Django admin “change item” page is trivial, using the pattern I wrote up here:

image

The implementation for this is tiny. We can use this pattern to add SO much depth to our admin pages—and Django’s default permission system is robust enough that we can give users access to these pages without them being able to make edits.

I also added a column to the reporters table showing the number of calls each reporter has made, and made that column sortable! Implementation here.

image

Once we’re fully on Django we’ll be able to turn around features like this in very little time. You don’t need to be a Django expert to build these either—a tiny bit of Python and HTML knowledge should be enough to productively modify this interface.

JSON in our PostgreSQL

My main goal for the day was to tighten up the call reports importer script I wrote yesterday.

My big breakthrough on this came after a long and super-valuable conversation with Nicholas Schiefer, who’s been heavily involved in the growth of our Airtable schema ever since the project started.

Our Airtable data is complicated, because the way we write data to it has constantly evolved. Before the launch of help.vaccinateca the data was all entered through a custom Airtable app, and many of the fields we are capturing now weren’t being captured just a few weeks ago.

Since writing a one-off importer that patches over all of these differences in a single go is virtually impossible, we decided to try an alternative track: my importer now saves the entire original Airtable JSON to a PostgreSQL JSON column (using Django 3.1’s brand new JSONField).

I taught the Django Admin to pretty-print the JSON (implementation here):

image

The most obvious value of this is in debugging—it’s much easier to look at a record now and compare it to the Airtable version.

More importantly: if we make a mistake in the importer code today and don’t notice for six months, that’s fine! We can re-backfill against the new lessons we have learned using the Airtable JSON data that we’ve already stored.

The cost? ~20,000 database records with a few extra KB of data stored against them. That’s totally worth it.

Querying JSON

PostgreSQL has a bunch of features for querying into JSON fields which I haven’t really explored before. I decided to try them out.

They’re incredible. Here’s a query that shows the callers who have made the most calls, based on extracting the {"Reported by": {"name": "NAME"}} nested field from that JSON column:

select
    jsonb_extract_path(
        airtable_json, 'Reported by', 'name'
    ) as name,
    count(*) as n
from call_report
group by name
order by n desc;

Even more exciting... here’s a query that counts the keys that have been used in ALL of the JSON returned from Airtable:

SELECT
    jsonb_object_keys(airtable_json) AS key, count(*)
FROM call_report GROUP BY key;

And here’s what it outputs:

JSON key Times used
Affiliation (from Location) 21589
airtable_createdTime 21589
airtable_id 21589
Appointments by phone? 466
Appointment scheduling instructions 2526
auth0_reporter_id 1583
auth0_reporter_name 1583
auth0_reporter_roles 1583
Availability 21589
County (from Location) 21589
Do not call until 3116
external_reports_base_external_report_id 3139
Hour 21589
ID 21589
Internal Notes 18258
is_latest_report_for_location 21589
is_pending_review 18
Location 21589
location_id 21589
location_latest_eva_report_time 13443
location_latest_report_id 21589
location_latest_report_time 21354
Location Type (from Location) 21588
Name (from Location) 21589
Notes 3359
Number of Reports (from Location) 21589
parent_eva_report 958
parent_external_report 2410
Phone 82
Reported by 21589
report_id 21589
Report Type 21589
soft-dropped-column: Vaccines available? 15948
time 21589
tmp_eva_flips 21589
Vaccine demand 733
Vaccine demand notes 4

This is amazingly useful data for the importer script that I’m writing!

The query also executes in about 800ms against the cheapest PostgreSQL database server that Heroku offer—doing a deep full table scan against all 22,000 imported records.

Based on how powerful is, I’m now thinking that we should go all-in on JSON in our database. Imagine if every scraper we were running dumped its full scraped JSON data into PostgreSQL—we could join arbitrary scraped data against our other tables to figure out if there are any new locations.

My biggest concern about replacing Airtable is that we’ll lose the amazing flexibility it’s given us. I think JSON columns can help bridge that gap.

Call targeting: the most interesting problem

Another topic that came out of my conversation with Nicholas: I had not seen quite how ingenious the way call targeting works is. This is SUCH a smart usage of Airtable!

image

The short version: call targeting (the logic that decides which number a volunteer should be asked to call) is powered by Airtable views, with really clever application of Airtable’s filters to help build up the call lists.

We need to maintain our ability to smartly target where the calls go, and ideally make it even better. This is going to be a really fun problem to solve!