PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document.
Recent articles
- What's new in the world of LLMs, for NICAR 2025 - 8th March 2025
- I built an automaton called Squadron - 4th March 2025
- Notes from my Accessibility and Gen AI podcast appearance - 2nd March 2025