<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: pdfminer</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/pdfminer.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2008-08-03T15:29:40+00:00</updated><author><name>Simon Willison</name></author><entry><title>PDFMiner</title><link href="https://simonwillison.net/2008/Aug/3/pdfminer/#atom-tag" rel="alternate"/><published>2008-08-03T15:29:40+00:00</published><updated>2008-08-03T15:29:40+00:00</updated><id>https://simonwillison.net/2008/Aug/3/pdfminer/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.unixuser.org/~euske/python/pdfminer/index.html"&gt;PDFMiner&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdfminer"&gt;pdfminer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;



</summary><category term="pdf"/><category term="pdfminer"/><category term="python"/><category term="scraping"/><category term="xml"/></entry></feed>