Tim Bray on Unicode
Tim Bray’s ongoing really is one of the best technical blogs out there (if it even is a blog). One of his current topics is Unicode, which is one of those topics that pretty much every software developer should try to get under their belt. On the Goodness of Unicode gives a thorough, entertaining overview of the subject (including its importance and why it isn’t as scary as it sounds) while Characters vs. Bytes is the first in a promised three part essay covering the technical details of modern character processing.
More recent articles
- Understanding GPT tokenizers - 8th June 2023
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023