Simon Willison’s Weblog

Generated content observation

Mark Pilgrim is unhappy with XHTML 2.0. Since the rest of the blogging community has already provided mass commentary on his post, I’ll make an observation concerning his further reading feature instead. The first link I saw to Mark’s post (and the one I followed) was on techno weenie, but I was surprised to later notice that techno weenie was not listed in the further reading list. For those who haven’t been paying attention, Mark’s further reading list is automatically generated from referrals, with verification from a clever Python script that checks the source page to make sure there really is a link, extract a relevant portion of the page and attempts to find a permalink for the entry as well.

Mark’s system usually has a very high success rate, so why had it not picked up on the entry on techno weenie? On closer inspection of the entry in question, I noticed that the only link back to Mark’s entry was the “source” link at the bottom of the quote. Suddenly it clicked—could Rick be using a DOM script to extract the cite attribute of the blockquote and display it on the page? I checked the code and, sure enough, Rick was using the script I published back in December. Mark’s script looks for a “real” link in the page, and since the look was generated on the client side from the DOM it failed to find one and assumed that the referral was invalid.

There isn’t really a moral to this story, I just thought it was interesting.

Also spotted in Mark’s further reading list, Tom Gilder has finally set himself up a blog. He promises lots of rants. This is a good thing.

This is Generated content observation by Simon Willison, posted on 13th January 2003.

Next: Blogs as agents

Previous: Stuart's pingback roundup

Previously hosted at