Simon Willison’s Weblog

Subscribe

Changelogs to help understand the fires in the North Bay

10th October 2017

The situation in the counties north of San Francisco is horrifying right now. I’ve repurposed some of the tools I built to for the Irma Response project last month to collect and track some data that might be of use to anyone trying to understand what’s happening up there. I’m sharing these now in the hope that they might prove useful.

I’m scraping a number of sources relevant to the crisis, and making the data available in a repository on GitHub. Because it’s a git repository, changes to those sources are tracked automatically. The value I’m providing here isn’t so much the data itself, it’s the history of the data. If you need to see what has changed and when, my repository’s commit log should have the answers for you. Or maybe you’ll just want to occasionally hit refresh on this history of changes to srcity.org/610/Emergency-Information to see when they edited the information.

The sources I’m tracking right now are:

  1. The Santa Rosa Fire Department’s Emergency Information page. This is being maintained by hand so it’s not a great source of structured data, but it has key details like the location and availability of shelters and it’s useful to know what was changed and when. History of changes to that page.
  2. PG&E power outages. This is probably the highest quality dataset with the neatest commit messages. The commit history of these shows exactly when new outages are reported and how many customers were affected.
  3. Road Conditions in the County of Sonoma. If you want to understand how far the fire has spread, this is a useful source of data as it shows which roads have been closed due to fire or other reasons. History of changes.
  4. California Highway Patrol Incidents, extracted from a KML feed on quickmap.dot.ca.gov. Since these cover the whole state of California there’s a lot of stuff in here that isn’t directly relevant to the North Bay, but the incidents that mention fire still help tell the story of what’s been happening. History of changes.

The code for the scrapers can be found in north_bay.py. Please leave comments, feedback or suggestions on other useful potential sources of data in this GitHub issue.

This is Changelogs to help understand the fires in the North Bay by Simon Willison, posted on 10th October 2017.

Part of series Git scraping

  1. Scraping hurricane Irma - Sept. 10, 2017, 6:21 a.m.
  2. Changelogs to help understand the fires in the North Bay - Oct. 10, 2017, 6:48 a.m.
  3. Generating a commit log for San Francisco's official list of trees - March 13, 2019, 2:49 p.m.
  4. Tracking PG&E outages by scraping to a git repo - Oct. 10, 2019, 11:32 p.m.
  5. Git scraping: track changes over time by scraping to a Git repository - Oct. 9, 2020, 6:27 p.m.
  6. Git scraping, the five minute lightning talk - March 5, 2021, 12:44 a.m.
  7. git-history: a tool for analyzing scraped data collected using Git and SQLite - Dec. 7, 2021, 10:32 p.m.
  8. … more

Next: Deploying an asynchronous Python microservice with Sanic and Zeit Now

Previous: Recovering missing content from the Internet Archive