Items tagged weeknotes, aws
Filters: weeknotes × aws × Sorted by date
sqlite-comprehend: run AWS entity extraction against content in a SQLite database
I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.
[... 1146 words]s3-ocr: Extract text from PDF files stored in an S3 bucket
I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.
[... 1493 words]Weeknotes: git-history, created for a Git scraping workshop
My main project this week was a 90 minute workshop I delivered about Git scraping at Coda.Br 2021, a Brazilian data journalism conference, on Friday. This inspired the creation of a brand new tool, git-history, plus smaller improvements to a range of other projects.
[... 1239 words]Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history
This week I got SpatiaLite 5 working in the Datasette Docker image, improved the CDC vaccination history git scraper, figured out Datasette on Azure and we closed on a new home!
[... 986 words]