W3C validator web service
Earlier today I mentioned how useful a web service interface to the new W3C validator would be. Tom Gilder pointed out in the comments that the validator now has an XML interface:
I had a play around and the XML interface works pretty well, although it still has a few quirks (hardly surprising for a beta product)—the information on whether or not the page is valid is passed back in an HTTP header (
X-W3X-Validator-Status) and if the page is unreachable or forbiden the interface returns an XHTML document. Still, it’s enough to play with, and as a demonstration of the flexibility of this new tool I’ve put together an XML-RPC proxy for the service:
Server : scripts.incutio.com Port : 80 Path : /xmlrpc/validator/validate.php Method : w3c.validate(url)
The web service accepts a URL and returns an XML-RPC struct containing the results of validation. The most important field of the struct is
status, which will be set to Valid, Invalid or Failed depending on whether or not the page passed the test (Failed means the validator threw back an XHTML page rather than XML, and can generally be assumed to mean the page has failed validation for some reason). The other fields of the struct contain information returned by the validator, including an array of warnings and an array of error messages if any were returned. The structure of the struct can be best understood by comparing it to the XML returned by the standard XML interface.
The source code for the web service is available in the following files:
- validate.php—the web service, implemented using IXR.
- SnoopyPlus.class.php—an extension of the Snoopy web client class
- classes.inc.php—various support classes, including the main XML parsing class
- W3cValidator.class.php—a class implementing the main logic of the web service (can be reused on its own)
I’ve also coded in a 100 queries / IP / hour limit, in the unlikely event that the service gets a large amount of traffic. I should stress that this is a beta web service built on top of a beta validator—it may stop working at any time, so it should not be considered suitable for production use. If you want to use it heavily feel free to download the source and set it up on your server, but remember that the W3C beta validator may well change it’s XML output rendering the web service useless.
More recent articles
- Understanding GPT tokenizers - 8th June 2023
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023