Simon Willison’s Weblog

Subscribe

Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C.