Validation on the fly
Douglas Bowman’s weblog is making very interesting reading at the moment. Douglas is responsible for Wired’s exciting new design and since the launch has been updating with observations and lessons learnt from the new look. On Friday he described how changing a problem with a design element took less than 60 seconds (thanks to global CSS files), but the post that caught my attention was this one:
However, daily editorial additions continue to allow XHTML validation errors to sneak into the Wired News markup. The most frequent culprits are the ampersands (&) which separate name/value pairs in URL query strings, or which commonly appear in our English language in company names like AT&T or slang acronyms like R&D.
[snip]
Somehow, we have to avoid the constant manual check of pages and retroactive fixes of existing errors. This method is unreliable and time consuming. I’m sure the engineers will be making modifications to our content insertion tool, so that validation errors like naked ampersands can be automatically detected and corrected as they’re entered.
I had the exact same problem with this blog. My solution was to throw every entry through PHP’s XML parser when it is added—if the XML parser throws an error a warning message is displayed to encourage me to validate the page and re-check the entry. I imagine Wired’s content management system requires a slightly more elaborate solution than that but for my small scale needs it has been working a treat.
Adrian Holovaty - 21st October 2002 21:11 - #
My blog uses a simple perl regexp with a negative look-ahead to detect HTML entities and already-corrected ampersands. I has been working well so far.
Let's see how your comment parser handles a regexp:
$entry =~ s/&(?!amp;|#|[\w]+;)/&/g;
Micah - 22nd October 2002 04:30 - #