XSS. Sanitising HTML is an extremely hard problem. The sanitize helper that ships with Rails is completely broken; Jacques Distler provides a better alternative.
XSS. Sanitising HTML is an extremely hard problem. The sanitize helper that ships with Rails is completely broken; Jacques Distler provides a better alternative.
Nonsense.
All you have to do is parse the input, and then compare the result to a white-list of elements, attributes (and attribute values). Trouble arises if your parser differs sustantially from the error-correcting parsers used by browsers. It may, then, interpret mangled HTML input differently from the way browsers do.
Since you're a Pythonista, just use HTML5lib.
Trouble also arises if you use stupid techniques, like RegExps, to process the input, instead of using a real parser.
But we wouldn't do that, now would we ... ?
Jacques Distler - 12th March 2007 03:12 - #
But the error correcting parser used by the most widely used browser is closed source and undocumented. The situation is better now than it used to be only because Hixie has spent the last two years reverse engineering it. You're still basically at the mercy of IE - if someone discovers a new weird IE behaviour that your sanitisation code doesn't take in to account you've got a new XSS hole. Second guessing IE (and Opera and so on) is a hard problem!
I need to update my sanitisation filter on this site to use html5lib. At the moment it uses an XML parser (hence the requirement for XHTML) and applies a whitelist.
Agreed on sanitizing valid XHTML, much easier than trying to validate tag soup. I think the only solution is to require completely valid XHTML input (which is easier to do than you might think) before sanitizing.
Remember that you can run tidy on the input and force XML validity... Will it change the markup? Possibly, but I'd rather have markup modified by me than by my users...
Love the blog, Simon!
Gustave Stresen-Reuter - 12th March 2007 12:39 - #
Agreed on sanitizing valid XHTML, much easier than trying to validate tag soup. I think the only solution is to require completely valid XHTML input (which is easier to do than you might think) before sanitizing.
Remember that you can run tidy on the input and force XML validity... Will it change the markup? Possibly, but I'd rather have markup modified by me than by my users...
Love the blog, Simon!
Gustave Stresen-Reuter - 12th March 2007 12:42 - #
If you're willing to be more draconian in your handling of user-input, you can worry less about whether the behaviour of your parser agrees with that of IE. In my branch of Instiki, the sanitization filter operates on the well-formed XHTML(+MathML+SVG) output generated by Maruku. Since it isn't being exposed to arbitrary crap as input, I have a much easier time ensuring that the output of my sanitization filter is
Jacques Distler - 12th March 2007 13:33 - #
Your CSS does something ... interesting ... with ordered lists in comments.
Jacques Distler - 12th March 2007 13:51 - #
I agree that it's a hard problem, and everyone has slightly different criteria and expectations.
You might be interested in my dirt-simple solution:
http://shiflett.org/blog/2007/mar/allowing-html-an d-preventing-xss