Living on a knife edge
6th May 2003
In The XHTML 100, Evan Goer describes an experiment in which he checked 119 site claiming to be with an XHTML doctype for full compliance with the W3C standards. His test consisted of three parts—a validation check on the front page, a check on another “inside” page, and a check to see if the correct Content-Type header (
application/xhtml+xml) was served to supporting User Agents (in his case Mozilla 1.3).
I don’t know if my site was included in the experiment, but if it was it failed at the third test as well. I have now implemented Mark Pilgrim’s trivial PHP fix (which serves the correct Content-Type to user agents that include
application/xhtml+xml in their
HTTP-ACCEPT header). This is no small step to take—serving XHTML with the correct Content-Type causes Gecko based browsers to attempt to parse it using a real XML parser, and should it turn out to be well formed they will refuse to render the site and die with an error message. Since I use Phoenix myself and almost certainly visit this site more than anyone else I’m hoping I’ll spot and fix any errors before anyone else runs in to them. Talk about living on a knife edge!
I’ve been cautious about recommending XHTML for several months now, and this turn of events has made me even more wary of it as a technology that is ready for mainstream use. Creating valid XHTML documents is extremely difficult—virtually impossible by hand without regular checks with the validator, and hard to achieve using home grown tools as well. I plan to revise my Validator Web Service code shortly to help run automated validation checks whenver I update, but it’s going to take quite a lot of effort to keep things working as they should.
So why bother when HTML 4.01 Strict gives all of the benefits of structural, valid markup with none of the additional hassles provided by XHTML? 6 months ago I would have said that XHTML is vital to support new light weight devices that can only handle an XML parser, but with mobile phones carrying full tag-soup capable web browsers that’s looking more and more unlikely. The greatest benefit provided by valid XHTML is the increased ability to automate the extraction and processing of content at a later date (see Mark Pilgrim’s acclaimed acronym and citation support for a concrete demonstration of this idea). I’ve been storing my blog entries as XHTML since I started blogging, and I maintain a firm belief that XHTML is an excellent format for storing items of content. Sadly, it just doesn’t seem practical or worthwhile to serve it to browsers.
I’m going to keep serving this blog as XHTML as an open experiment in the practicalities and challenges involved in doing so, but from now on my other web projects will target HTML 4.01 Strict.
More recent articles
- Things I've learned about building CLI tools in Python - 30th September 2023
- Talking Large Language Models with Rooftop Ruby - 29th September 2023
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023
- Weeknotes: Plugins for LLM, sqlite-utils and Datasette - 5th August 2023