Living on a knife edge
6th May 2003
In The XHTML 100, Evan Goer describes an experiment in which he checked 119 site claiming to be with an XHTML doctype for full compliance with the W3C standards. His test consisted of three parts—a validation check on the front page, a check on another “inside” page, and a check to see if the correct Content-Type header (
application/xhtml+xml) was served to supporting User Agents (in his case Mozilla 1.3).
I don’t know if my site was included in the experiment, but if it was it failed at the third test as well. I have now implemented Mark Pilgrim’s trivial PHP fix (which serves the correct Content-Type to user agents that include
application/xhtml+xml in their
HTTP-ACCEPT header). This is no small step to take—serving XHTML with the correct Content-Type causes Gecko based browsers to attempt to parse it using a real XML parser, and should it turn out to be well formed they will refuse to render the site and die with an error message. Since I use Phoenix myself and almost certainly visit this site more than anyone else I’m hoping I’ll spot and fix any errors before anyone else runs in to them. Talk about living on a knife edge!
I’ve been cautious about recommending XHTML for several months now, and this turn of events has made me even more wary of it as a technology that is ready for mainstream use. Creating valid XHTML documents is extremely difficult—virtually impossible by hand without regular checks with the validator, and hard to achieve using home grown tools as well. I plan to revise my Validator Web Service code shortly to help run automated validation checks whenver I update, but it’s going to take quite a lot of effort to keep things working as they should.
So why bother when HTML 4.01 Strict gives all of the benefits of structural, valid markup with none of the additional hassles provided by XHTML? 6 months ago I would have said that XHTML is vital to support new light weight devices that can only handle an XML parser, but with mobile phones carrying full tag-soup capable web browsers that’s looking more and more unlikely. The greatest benefit provided by valid XHTML is the increased ability to automate the extraction and processing of content at a later date (see Mark Pilgrim’s acclaimed acronym and citation support for a concrete demonstration of this idea). I’ve been storing my blog entries as XHTML since I started blogging, and I maintain a firm belief that XHTML is an excellent format for storing items of content. Sadly, it just doesn’t seem practical or worthwhile to serve it to browsers.
I’m going to keep serving this blog as XHTML as an open experiment in the practicalities and challenges involved in doing so, but from now on my other web projects will target HTML 4.01 Strict.
More recent articles
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023
- Now add a walrus: Prompt engineering in DALL-E 3 - 26th October 2023