Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

robots.txt Adventure. Interesting notes from crawling 4.6 million robots.txt, including 69 different ways in which the word “disallow” can be mis-spelled.

Tagged , ,

2 comments

  1. Funny,
    I just yesterday pieced together a little Django app which handles robots.txt requests, manageable with the admin interface (currently only with the oldforms-admin/trunk). Thanks for the great link!

    http://code.google.com/p/django-robots/

    Jannis Leidel - 22nd September 2007 09:04 - #

  2. Just to let the people coming from search engines know: I updated django-robots (0.2) to reflect the problems Andrew Wooster found in his robots.txt Adventure, e.g., correct mimetype and http status codes for resulting robots.txt, crawl-delay support, Allow and Dissallow rules and automatic Sitemap contrib app support.

    Jannis Leidel - 7th October 2007 13:56 - #

Sign in with OpenID

Auto-HTML: Line breaks are preserved; URLs will be converted in to links.

Manual XHTML: Enter your own, valid XHTML. Allowed tags are a, p, blockquote, ul, ol, li, dl, dt, dd, em, strong, dfn, code, q, samp, kbd, var, cite, abbr, acronym, sub, sup, br, pre

A django site