Simon Willison’s Weblog

PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document.