Simon Willison’s Weblog

Subscribe

Wednesday, 6th November 2019

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open("file.pdf", "rb")).getPage(0).extractText()

# 4:17 pm / pdf, python

The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry.

# 11:08 pm / django, sentry

2019 » November

MTWTFSS
    123
45678910
11121314151617
18192021222324
252627282930