Simon Willison’s Weblog

Subscribe

Items tagged datajournalism, tesseract in 2024

Filters: Year: 2024 × datajournalism × tesseract × Sorted by date


Running OCR against PDFs and images directly in your browser

I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images?

[... 2263 words]

Types

Years

Months

Tags