Validation on the fly
Douglas Bowman’s weblog is making very interesting reading at the moment. Douglas is responsible for Wired’s exciting new design and since the launch has been updating with observations and lessons learnt from the new look. On Friday he described how changing a problem with a design element took less than 60 seconds (thanks to global CSS files), but the post that caught my attention was this one:
However, daily editorial additions continue to allow XHTML validation errors to sneak into the Wired News markup. The most frequent culprits are the ampersands (&) which separate name/value pairs in URL query strings, or which commonly appear in our English language in company names like AT&T or slang acronyms like R&D.
[snip]
Somehow, we have to avoid the constant manual check of pages and retroactive fixes of existing errors. This method is unreliable and time consuming. I’m sure the engineers will be making modifications to our content insertion tool, so that validation errors like naked ampersands can be automatically detected and corrected as they’re entered.
I had the exact same problem with this blog. My solution was to throw every entry through PHP’s XML parser when it is added—if the XML parser throws an error a warning message is displayed to encourage me to validate the page and re-check the entry. I imagine Wired’s content management system requires a slightly more elaborate solution than that but for my small scale needs it has been working a treat.
More recent articles
- Understanding GPT tokenizers - 8th June 2023
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023