Simon Willison’s Weblog

Subscribe

Items tagged dom, xml

Filters: dom × xml × Sorted by date


Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML. # 24th January 2009, 11:52 pm

Atom Models. Building Python classes that act as utility wrappers around data stored in an lxml DOM object. # 7th August 2007, 4:02 pm