36 posts tagged “scraping”
2008
lxml: an underappreciated web scraping library. I just wish I could get the wretched thing to install on OS X Leopard without resorting to MacPorts.
Data Scraping Wikipedia with Google Spreadsheets. I hadn’t played with =importHTML in Google spreadsheets, which lets you suck in data from an HTML table or list somewhere on the web. This tutorial takes it further, bringing Wikipedia, Yahoo! Pipes and KML in to the mix.
PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document.
2007
/trunk/jl/scraper. journa-list.com is open source, and the screen scrapers are written in Python.
2005
scrape.py. A clever Python screen-scraping module, with similarities to WWW::Mechanize.
2004
WWW::Odeon (via) A simple API for screen-scraping the www.odeon.co.uk website.