Simon Willison’s Weblog

Subscribe

Saturday, 27th July 2002

Stanford guidelines

Stanford Guidelines for Web Credibility:

DMOZ for Bath

I’ve had my application for editorship of the DMOZ University of Bath Category accepted. Bath’s main site has notoriously bad navigation, so hopefully I’ll be able to use DMOZ to build an alternative. I’m also looking in to eventually syndicating the DMOZ category via RDF and replicating it elsewhere. Unfortunately it looks like you have to grab the whole 130MB RDF file to do this, but I’ve seen tools that syndicate smaller portions of DMOZ so it must be possible to extract only the information you are interested in.

Syndicating the ODP

Having looked at some of these tools for syndicating content from the ODP, it seems that the standard method is to grab and parse the actual HTML files from the site rather than grabbing the huge RDF files. This would be a lot easier if the pages of the site were valid XHTML, but unfortunately they don’t even have a DOCTYPE. Luckily I wrote a page-link parser the other day for something else which seems to do a pretty good job on the ODP, so I should be able to put together a decent script without too much trouble.