Simon Willison’s Weblog

6th November 2019 - Link Blog

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open("file.pdf", "rb")).getPage(0).extractText()

Posted 6th November 2019 at 4:17 pm

Recent articles

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools - 8th April 2026
Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me - 7th April 2026
The Axios supply chain attack used individually targeted social engineering - 3rd April 2026

This is a link post by Simon Willison, posted on 6th November 2019.

pdf 39 python 1242

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe