Simon Willison’s Weblog

Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

Vaccinate CA backend progress

My key project at the moment is buiding out a new Django-powered backend for the Vaccinate CA call reporting application—where real human beings constantly call pharmacies and medical sites around California to build a comprehensive guide to where the Covid vaccine is available.

As of this week, the new backend is running for a subset of the overall call volume. It’s exciting! It’s also a reminder that the single hardest piece of logic in any crowdsourcing-style application is the logic that gives a human being their next task. I’m continuing to evolve that logic, which is somewhat harder when the system I’m modifying is actively being used.

tableau-to-sqlite

The Vaccinate CA project is constantly on the lookout for new sources of data that might indicate locations that have the vaccine. Some of this data is locked up in Tableau dashboards, which are notoriously tricky to scrape.

When faced with problems like this, I frequently turn to GitHub code search: I’ll find a unique looking token in the data I’m trying to wrangle and run searches to see if anyone on GitHub has written code to handle it.

In doing so, I came across Tableau Scraper—an open source Python library by Bertrand Martel which does a fantastic job of turning a Tableau dashboard into a Pandas DataFrame.

Writing a Pandas DataFrame to a SQLite database is a one-liner: df.to_sql("table-name", sqlite3.connect(db_path)). So I spun up a quick command-line wrapper around the TableuaScraper class called tableau-to-sqlite which lets you do the following:

% tableau-to-sqlite tableau.db https://results.mo.gov/t/COVID19/views/VaccinationsDashboard/Vaccinations

Considering how much valuable data is trapped in government Tableau dashboards I’m really excited to point this tool at more sources. The README includes tips on combining this with sqlite-utils to get a CSV or JSON export which can then be tracked using Git scraping.

django-sql-dashboard

I’m continuing to ponder the idea of getting Datasette to talk to PostgreSQL in addition to SQLite, but in the meantime I have a growing Django application that runs against PostgreSQL and a desire to build some quick dashboards against it.

One of Datasette’s key features is the ability to bookmark a read-only SQL query and share that link with other people. It’s SQL injection attacks repurposed as a feature, and it’s proved to be incredibly useful over the past few years.

Here’s an example from earlier this week where I wanted to see how many GitHub issues I had opened and then closed within 60 seconds. The answer is 17!

django-sql-dashboard is my highly experimental exploration of what that idea looks like against a PostgreSQL database, wrapped inside a Django application

The key idea is to support executing read-only PostgreSQL statements with a strict timelimit (set using PostgreSQL’s statement_timeout setting, described here). Users can execute SQL directly, bookmark and share queries and save them to a database table in order to construct persistent dashboards.

It’s very early days for the project yet, and I’m still not 100% convinced it’s a good idea, but early signs are very promising.

A fun feature is that it lets you have more than one SQL query on the same page. Here’s what it looks like running against my blog’s database, showing a count query and the months in which I wrote the most blog entries:

Two SQL queries are shown: select count() from blog_entry; and select to_char(date_trunc('month', created), 'Mon YYYY') as month, count() from blog_entry group by month order by count(*) desc

Releases this week

TIL this week