2 items tagged “robotstxt”
The X-Robots-Tag HTTP header. News to me, but both Google and Yahoo! have supported it since last year. You can add per-page robots exclusion rules in HTTP headers instead of using meta tags, and Google’s version supports unavailable_after which is handy for content with a known limited shelf-life.
9th June 2008, 9:21 am
robots.txt Adventure. Interesting notes from crawling 4.6 million robots.txt, including 69 different ways in which the word “disallow” can be mis-spelled.
22nd September 2007, 12:36 am