Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

W3C validator web service

Earlier today I mentioned how useful a web service interface to the new W3C validator would be. Tom Gilder pointed out in the comments that the validator now has an XML interface:

http://validator.w3.org:8001/check?uri=http://simon.incutio.com/&output=xml

I had a play around and the XML interface works pretty well, although it still has a few quirks (hardly surprising for a beta product)—the information on whether or not the page is valid is passed back in an HTTP header (X-W3X-Validator-Status) and if the page is unreachable or forbiden the interface returns an XHTML document. Still, it’s enough to play with, and as a demonstration of the flexibility of this new tool I’ve put together an XML-RPC proxy for the service:

Server : scripts.incutio.com
Port   : 80
Path   : /xmlrpc/validator/validate.php
Method : w3c.validate(url)

The web service accepts a URL and returns an XML-RPC struct containing the results of validation. The most important field of the struct is status, which will be set to Valid, Invalid or Failed depending on whether or not the page passed the test (Failed means the validator threw back an XHTML page rather than XML, and can generally be assumed to mean the page has failed validation for some reason). The other fields of the struct contain information returned by the validator, including an array of warnings and an array of error messages if any were returned. The structure of the struct can be best understood by comparing it to the XML returned by the standard XML interface.

The source code for the web service is available in the following files:

I’ve also coded in a 100 queries / IP / hour limit, in the unlikely event that the service gets a large amount of traffic. I should stress that this is a beta web service built on top of a beta validator—it may stop working at any time, so it should not be considered suitable for production use. If you want to use it heavily feel free to download the source and set it up on your server, but remember that the W3C beta validator may well change it’s XML output rendering the web service useless.

This is W3C validator web service by Simon Willison, posted on 28th October 2002.

View blog reactions

Next: PHP at Yahoo

Previous: Apple Internet Developer

11 comments

  1. Incidentally, I know my blog isn't validating at the moment - I'm leaving it invalid for a while as it's quite handy for demonstrating the web service :)

    Simon Willison - 28th October 2002 15:49 - #

  2. suuuuure.

    rick - 28th October 2002 19:29 - #

  3. Heh, nice :) btw, looking at the source code (where I found the XML output originally) - http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/c heck?rev=1.274 - you can also set output to "n3" and "earl".

    Tom Gilder - 28th October 2002 21:11 - #

  4. Nice code Simon, I installed a copy on my server but had to specifically put this in a .htaccess file: php_flag allow_call_time_pass_reference On. This is because of line #75 in classes.inc.php. Other than that, anyone know what kind of info is hidden in the 'offset' field?

    michel v - 28th October 2002 21:52 - #

  5. Ah, now that I coded a little command line client for this service, I think I found either with IXR or with your webservice. Here's what the validator's XML says: <msg> end tag for &#34;link&#34; omitted, but OMITTAG NO was specified</msg> Here's what the webservice returns for this message: <member><name>msg</name><value><string>omitted, but OMITTAG NO was specified</string></value></member> It would just only return the part that's after the last double quote :( I'm going to see if it's a problem in validate.php now...

    michel v - 28th October 2002 22:09 - #

  6. OK, I got the error location down to either the SnoopyPlus extension, or Snoopy itself.

    michel v - 28th October 2002 22:18 - #

  7. Damn, I'm stupid, and I shall stop comment-spamming: the error is with the parsing, but not in anything Snoopy. Looks like the data handler never gets a complete string. The string is cut at every double quote. Getting complicated to fix there -_-

    michel v - 28th October 2002 22:47 - #

  8. Ah, the fix is so trivial I didn't think of it. Turns out parsing XML that has " in it triggers some kind of bug in PHP's XML parser. So, here's the simple fix, just add this line after line #79 in classes.inc.php: $xml = str_replace('"', '"', $xml);

    michel v - 28th October 2002 22:59 - #

  9. Yep, the XML output for the validator is useful ( http://valet.webthing.com/ 's original xml output has had some interesting developments on what can be done generating different views. ) I've not checked the snoopy class (I'm not a php kinda guy) but you seem to be calling "fetch", you should probably only do a HEAD request, no point getting the XML document itself, when you could just get the header. In testing the javascript bookmarklet at http://validator.w3.org:8001/favlets.html used the HEAD request to do the same for Mozilla/IE. The lack of header and xhtml doc under Fail conditions has been acknowledged as a bug btw.

    Jim Ley - 29th October 2002 09:42 - #

  10. I grab the full document because if the page is valid it includes information on the doctype / encoding, while if the page is invalid it returns the warnings and errors encoded as XML. You're right though, a light weight version of the script that just checks for validity would be much better off making just a HEAD request. That said, I would like to see the validity information repeated in the XML document - I have an idea to rewrite the web service to use XSLT to transform the W3C XML document in to an XML-RPC (or SOAP) response without any further analysis of the XML required. Doing so would make it much easier to maintain the script when they change their XML output, as well as being a great excuse for me to finally learn some XSLT...

    Simon Willison - 29th October 2002 11:39 - #

  11. Is it just me trying to access the site/service/whatever incorrectly, or this beta validator web service isn't working anymore? Is there a new adress or something?

    Btw here's some simple XSLT that might do the simple job you requre:
    <?xml version="1.0"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml"/>
    <xsl:template match="/">
    <status><xsl:value-of select="//status"></status>
    </xsl:template>
    </xsl:stylesheet>

    Vasil Rangelov - 23rd July 2006 12:46 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2002/10/28/w3cValidatorWebService

A django site