Importing data from Airtable into Django, plus a search engine for all our code
24th February 2021
Originally posted to my internal blog at VaccinateCA
I made a bunch of progress on the Django backend prototype-that-soon-won’t-be-a-prototype today.
Importing data from Airtable
My goal for the day was to get some real data into the prototype, imported from Airtable. I’ve now done that with the two most important tables: Locations (aka places that people can go to get the vaccine) and CallReports, created by volunteers making phone calls.
I achieved this by writing two new Django management commands:
-
./manage.py import_airtable_locations
(code, test) populates the Django Locations model by pulling data from the Locations table in Airtable. -
./manage.py import_airtable_reports
(code, test) populates the Django CallReports model using data from the Reports table in Airtable.
Both of these importers follow the same pattern: you can point them directly at a Locations.json file you’ve already downloaded:
./manage.py import_airtable_locations \
--json-file=Locations.json
Or you can use a GitHub personal access token to have them load the data directly from our airtable-data-backup repository, provided you have access to that.
./manage.py import_airtable_locations \
--github-token=xxxx
I built the GitHub token mechanism to make it easy to run this command on a server, without having to mess around uplooading JSON files first. Since the prototype is running on Heroku I can pull a fresh import into it directly by running the following command:
% heroku run bash -a vaccinateca-preview
Running bash on ⬢ vaccinateca-preview... up, run.3026 (Hobby)
~ $ vaccinate/manage.py import_airtable_locations --github-token xxx
Skipping rec0xZ5EaKnnynfDa [name=CVS Pharmacy® & Drug Store at 25272 Marguerite Pkwy, Mission Viejo, CA 92692], reason=No latitude
Skipping rec7nHXCuSYRR61V0 [name=None], reason=No name
Skipping rec8Xk6kn4SvAKeEm [name=None], reason=No name
Skipping recCYZZRJCRlXykun [name=None], reason=No name
Skipping recJt0iQbqmglF0XL [name=Dignity Health (Woodland)], reason=No county
The command outputs a line for each item it fails to import. This turns out to be an extremely useful way to spot invalid data—locations with no name, latitude or county for example.
The script successfully imported 7038 locations, skipping just 17.
My call report importer needs a lot more work. It imported 21,029 call records, but skipped 962. More details in the GitHub issue thread.
Django Admin customization
I also made some tweaks to the Django admin. Here’s a screenshot of the call records list as it stands now.
And here’s the locations table:
SSO with Auth0
This isn’t fully working yet, but it’s nearly there. I’ve integrated Auth0 SSO with the Django app, with the goal being that any staff member can use Auth0 to sign in to the Django Admin panel and view and modify the data there.
You can try it out on the https://vaccinateca-preview.herokuapp.com/ page—once it’s fully working, you’ll be able to sign in and then visit https://vaccinateca-preview.herokuapp.com/admin/ to interact with the data.
Here’s the ongoing issue thread. I ended up creating a brand new role in Auth0 called “Vaccinate CA Staff”—I’ve figured out how to access those roles from the Django app when a user signs in, so I can then use membership of that role to control access to the admin panel.
Even more ambitious goals
Since the Django backend work is going at a very healthy pace, I’m extending my ambitions for it a bit.
The goal of the app is to replace Airtable as the point of truth for the data collected by our calling volunteers AND as the data source behind the public-facing https://www.vaccinateca.com/ site.
Once I’ve made some improvements to the schema informed by the data importing project, I think the next step will be to spin up just enough of an API endpoint that the calling app can start writing to Django in parallel to writing to Airtable.
This can be done inside the Netlify function used by the app. I’d like that function to continue writing to Airtable but also to write to the Django/PostgreSQL stack, wrapped in an error handler so failures there are ignored.
Then we can run the Django app as a silent partner to Airtable for a few days and compare the results gathered by the two, to gain confidence before switching over from one to the other.
Getting that running will be my goal for next week. If all goes well we may find we can make the full switch to the Django backend within a couple of weeks.
Regular expression code search across all of our repos
I love code search. I particularly like being able to search code with regular expressions, and then share links to those searches with other engineers.
The best code search tool I’ve ever used is ripgrep. A few months ago I built a simple web wrapper around ripgrep called datasette-ripgrep. This evening I deployed a copy of it against the source code from nine of our code repos:
- CAVaccineInventory/airtable-sql-science
- CAVaccineInventory/gists
- CAVaccineInventory/airtable-export
- CAVaccineInventory/data-engineering
- CAVaccineInventory/help.vaccinate
- CAVaccineInventory/airtableApps
- CAVaccineInventory/django.vaccinate
- CAVaccineInventory/site
- CAVaccineInventory/vaccinebot
You can access the search engine here (nicer URL coming soon):
https://vaccinateca-ripgrep-j7hipcg4aq-uc.a.run.app/
You’ll need to sign in with GitHub—I’m protecting the site using datasette-auth-github configured to only allow in members of the CAVaccineInventor
GitHub organization.
Once you’ve signed in, try this example search:
That runs a search for fetch\(|fetchJsonFromEndpoint
across all of our *.js
files, which shows us everywhere we are making an HTTP reuest using either fetch()
or our own fetchJsonFromEndpoint()
function.
The repo for the search engine is here.
More recent articles
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024
- Claude 3.5 Haiku - 4th November 2024