Safe HTML checker
23rd February 2003
I’ve finally enabled a subset of HTML in my comments. In doing so, I had several requirements that needed to be fulfilled:
- Entered markup must be valid to XHTML strict, to stop comments form breaking validation and keep things nice and tidy.
- No presentational markup! I want to maintain control over how things look via my stylesheets—comments posted should only be able to use structural HTML elements.
- Attributes should be restricted to those that add semantic meaning. Javascript event attributes and CSS related attributes should not be allowed.
- I should retain full control over the tags and attributes allowed in the comments.
- Submitted HTML must be kept free from anything that could pose a security risk, such as
javascript:
URLs.
The system I have implemented works by running submitted posts through an XML parser, which checks that each element is in my list of allowed elements, is nested correctly (you can’t put a blockquote
inside a p
for example) and doesn’t have any illegal attributes. My initial test have shown it to work pretty well, but if anyone wants to have a go at breaking it please, be my guest.
The code for the main class is available here: SafeHtmlChecker.class.php
More recent articles
- Trying out llama.cpp's new vision support - 10th May 2025
- Saying "hi" to Microsoft's Phi-4-reasoning - 6th May 2025
- Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25) - 5th May 2025