Simon Willison’s Weblog

Items tagged datajournalism in 2020

Filters: Year: 2020 × datajournalism ×


nyt-2020-election-scraper. Brilliant application of git scraping by Alex Gaynor and a growing team of contributors. Takes a JSON snapshot of the NYT’s latest election poll figures every five minutes, then runs a Python script to iterate through the history and build an HTML page showing the trends, including what percentage of the remaining votes each candidate needs to win each state. This is the perfect case study in why it can be useful to take a “snapshot if the world right now” data source and turn it into a git revision history over time. # 6th November 2020, 2:24 pm

selenium-wire. Really useful scraping tool: enhances the Python Selenium bindings to run against a proxy which then allows Python scraping code to look at captured requests—great for if a site you are working with triggers Ajax requests and you want to extract data from the raw JSON that came back. # 2nd November 2020, 6:58 pm

sba-loans-covid-19-datasette (via) The treasury department released a bunch of data on the Covid-19 SBA Paycheck Protection Program Loan recipients today—I’ve loaded the most interesting data (the $150,000+ loans) into a Datasette instance. # 7th July 2020, 2:42 am

Data Journalism Academy (via) MaryJo Webster is the data editor for the Star Tribune in Minneapolis, and a 2019 Pulitzer nominee. She’s has a huge amount of experience teaching data journalism and has just released her accumulated teaching materials in the form of the Data Journalism Academy. # 11th May 2020, 4:45 am

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

[... 993 words]

Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf

I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.

[... 834 words]

Tracking FARA by deploying a data API using GitHub Actions and Cloud Run

I’m using the combination of GitHub Actions and Google Cloud Run to retrieve data from the U.S. Department of Justice FARA website and deploy it as a queryable API using Datasette.

[... 1599 words]