Wednesday, 27th July 2022
SQLite Internals: Pages & B-trees (via) Ben Johnson provides a delightfully clear introduction to SQLite internals, describing the binary format used to store rows on disk and how SQLite uses 4KB pages for both row storage and for the b-trees used to look up records.
Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C.
TIL
Testing things in Fedora using Docker
— I got [a report](https://twitter.com/peterjanes/status/1552407491819884544) of a bug with my [s3-ocr tool](https://simonwillison.net/2022/Jun/30/s3-ocr/) running on Fedora.