Simon Willison’s Weblog

3 items tagged “scraping”

Using “import refs” to iteratively import data into Django

I’ve been writing a few scripts to backfill my blog with content I originally posted elsewhere. So far I’ve imported answers I posted on Quora (background), answers I posted on Ask MetaFilter and content I recovered from the Internet Archive.

[... 560 words]

YQL—converting the web to JSON with mock SQL. YQL just got a whole lot more interesting to me—I had no idea they were exposing an HTML and RSS scraping tool over a JSONP API in addition to all of the Yahoo! web service methods. # 13th December 2008, 9:39 am

Data Scraping Wikipedia with Google Spreadsheets. I hadn’t played with =importHTML in Google spreadsheets, which lets you suck in data from an HTML table or list somewhere on the web. This tutorial takes it further, bringing Wikipedia, Yahoo! Pipes and KML in to the mix. # 16th October 2008, 2:37 pm