Simon Willison’s Weblog

Subscribe

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe