Simon Willison’s Weblog

Subscribe

Entries tagged ocr in 2024

Filters: Type: entry × Year: 2024 × ocr × Sorted by date


Running OCR against PDFs and images directly in your browser

I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images?

[... 2263 words]

Types

Years

Months

Tags