Simon Willison’s Weblog

Subscribe
Atom feed for git-scraping

34 posts tagged “git-scraping”

Git scraping is a technique where data is scraped from an external source into a Git repository in order to record changes to that data over time.

2019

Tracking PG&E outages by scraping to a git repo

Visit Tracking PG&E outages by scraping to a git repo

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

[... 868 words]

Generating a commit log for San Francisco’s official list of trees

Visit Generating a commit log for San Francisco's official list of trees

San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.

[... 1,051 words]

2017

Changelogs to help understand the fires in the North Bay

The situation in the counties north of San Francisco is horrifying right now. I’ve repurposed some of the tools I built to for the Irma Response project last month to collect and track some data that might be of use to anyone trying to understand what’s happening up there. I’m sharing these now in the hope that they might prove useful.

[... 383 words]

Scraping hurricane Irma

The Irma Response project is a team of volunteers working together to make information available during and after the storm. There is a huge amount of information out there, on many different websites. The Irma API is an attempt to gather key information in one place, verify it and publish it in a reuseable way. It currently powers the irmashelters.org website.

[... 438 words]