Simon Willison’s Weblog

Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML.

Posted 24th January 2009 at 11:52 pm

Recent articles

Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024
Three major LLM releases in 24 hours (plus weeknotes) - 10th April 2024
Building files-to-prompt entirely using Claude 3 Opus - 8th April 2024
Running OCR against PDFs and images directly in your browser - 30th March 2024

crowbar 1 dom 12 gecko 4 greasemonkey 19 mozilla 45 rdf 5 screenscraping 7 webservice 3 xml 55 xulrunner 3