Safe HTML checker
23rd February 2003
I’ve finally enabled a subset of HTML in my comments. In doing so, I had several requirements that needed to be fulfilled:
- Entered markup must be valid to XHTML strict, to stop comments form breaking validation and keep things nice and tidy.
- No presentational markup! I want to maintain control over how things look via my stylesheets—comments posted should only be able to use structural HTML elements.
- Attributes should be restricted to those that add semantic meaning. Javascript event attributes and CSS related attributes should not be allowed.
- I should retain full control over the tags and attributes allowed in the comments.
- Submitted HTML must be kept free from anything that could pose a security risk, such as
javascript:
URLs.
The system I have implemented works by running submitted posts through an XML parser, which checks that each element is in my list of allowed elements, is nested correctly (you can’t put a blockquote
inside a p
for example) and doesn’t have any illegal attributes. My initial test have shown it to work pretty well, but if anyone wants to have a go at breaking it please, be my guest.
The code for the main class is available here: SafeHtmlChecker.class.php
More recent articles
- GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search - 6th September 2025
- V&A East Storehouse and Operation Mincemeat in London - 27th August 2025
- The Summer of Johann: prompt injections as far as the eye can see - 15th August 2025