Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C.
Recent articles
- Teresa T is name of the whale in Pillar Point Harbor near Half Moon Bay - 8th September 2024
- Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes - 6th September 2024
- Building a tool showing how Gemini Pro can return bounding boxes for objects in images - 26th August 2024