Simon Willison’s Weblog

Subscribe

Items tagged datajournalism in 2021

Filters: Year: 2021 × datajournalism × Sorted by date


git-history: a tool for analyzing scraped data collected using Git and SQLite

I described Git scraping last year: a technique for writing scrapers where you periodically snapshot a source of data to a Git repository in order to record changes to that source over time.

[... 2002 words]

Weeknotes: sqlite-transform 1.1, Datasette 0.58.1, datasette-graphql 1.5

Work on Project Pelican inspires new features and improvements across a number of different projects.

[... 1419 words]

The Accountability Project Datasettes. The Accountability Project “curates, standardizes and indexes public data to give journalists, researchers and others a simple way to search across otherwise siloed records”—they have a wide range of useful data, and they’ve started experimenting with Datasette to provide SQL access to a subset of the information that they have collected. # 22nd March 2021, 12:07 am

Weeknotes: Datasette and Git scraping at NICAR, VaccinateCA

This week I virtually attended the NICAR data journalism conference and made a ton of progress on the Django backend for VaccinateCA (see last week).

[... 773 words]

Git scraping, the five minute lightning talk

I prepared a lightning talk about Git scraping for the NICAR 2021 data journalism conference. In the talk I explain the idea of running scheduled scrapers in GitHub Actions, show some examples and then live code a new scraper for the CDC’s vaccination data using the GitHub web interface. Here’s the video.

[... 289 words]