Simon Willison’s Weblog

Subscribe

4 items tagged “robotstxt”

2020

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

[... 703 words]

datasette-block-robots. Another little Datasette plugin: this one adds a /robots.txt page with “Disallow: /” to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the datasette-plugin cookiecutter template. # 23rd June 2020, 3:28 am

2008

The X-Robots-Tag HTTP header. News to me, but both Google and Yahoo! have supported it since last year. You can add per-page robots exclusion rules in HTTP headers instead of using meta tags, and Google’s version supports unavailable_after which is handy for content with a known limited shelf-life. # 9th June 2008, 9:21 am

2007

robots.txt Adventure. Interesting notes from crawling 4.6 million robots.txt, including 69 different ways in which the word “disallow” can be mis-spelled. # 22nd September 2007, 12:36 am