<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: robots-txt</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/robots-txt.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-05-24T06:38:50+00:00</updated><author><name>Simon Willison</name></author><entry><title>Nilay Patel reports a hallucinated ChatGPT summary of his own article</title><link href="https://simonwillison.net/2024/May/24/nilay-patel-hallucinated-chatgpt/#atom-tag" rel="alternate"/><published>2024-05-24T06:38:50+00:00</published><updated>2024-05-24T06:38:50+00:00</updated><id>https://simonwillison.net/2024/May/24/nilay-patel-hallucinated-chatgpt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.threads.net/@reckless1280/post/C7MeXn6LOt_"&gt;Nilay Patel reports a hallucinated ChatGPT summary of his own article&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's a ChatGPT bug that's a new twist on the &lt;a href="https://simonwillison.net/2023/Mar/10/chatgpt-internet-access/"&gt;old issue&lt;/a&gt; where it would hallucinate the contents of a web page based on the URL.&lt;/p&gt;
&lt;p&gt;The Verge editor Nilay Patel asked for a summary of one of his own articles, pasting in the URL.&lt;/p&gt;
&lt;p&gt;ChatGPT 4o replied with an entirely invented summary full of hallucinated details.&lt;/p&gt;
&lt;p&gt;It turns out The Verge blocks ChatGPT's browse mode from accessing their site in their &lt;a href="https://www.theverge.com/robots.txt"&gt;robots.txt&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;User-agent: ChatGPT-User
Disallow: /
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Clearly ChatGPT should reply that it is unable to access the provided URL, rather than inventing a response that guesses at the contents!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://www.computerworld.com/article/2117752/google-gemini-ai.html"&gt;Gemini is the new Google+&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nilay-patel"&gt;nilay-patel&lt;/a&gt;&lt;/p&gt;



</summary><category term="robots-txt"/><category term="ai"/><category term="openai"/><category term="chatgpt"/><category term="llms"/><category term="nilay-patel"/></entry><entry><title>Quoting quora.com/robots.txt</title><link href="https://simonwillison.net/2024/Mar/19/quora-robots/#atom-tag" rel="alternate"/><published>2024-03-19T23:09:31+00:00</published><updated>2024-03-19T23:09:31+00:00</updated><id>https://simonwillison.net/2024/Mar/19/quora-robots/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.quora.com/robots.txt"&gt;&lt;p&gt;People share a lot of sensitive material on Quora - controversial political views, workplace gossip and compensation, and negative opinions held of companies. Over many years, as they change jobs or change their views, it is important that they can delete or anonymize their previously-written answers.&lt;/p&gt;
&lt;p&gt;We opt out of the wayback machine because inclusion would allow people to discover the identity of authors who had written sensitive answers publicly and later had made them anonymous, and because it would prevent authors from being able to remove their content from the internet if they change their mind about publishing it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.quora.com/robots.txt"&gt;quora.com/robots.txt&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/internet-archive"&gt;internet-archive&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;



</summary><category term="internet-archive"/><category term="robots-txt"/><category term="quora"/></entry><entry><title>Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate</title><link href="https://simonwillison.net/2020/Jun/26/weeknotes-plugins-sqlite-generate/#atom-tag" rel="alternate"/><published>2020-06-26T01:39:50+00:00</published><updated>2020-06-26T01:39:50+00:00</updated><id>https://simonwillison.net/2020/Jun/26/weeknotes-plugins-sqlite-generate/#atom-tag</id><summary type="html">
    &lt;p&gt;I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt;. I wrote about &lt;a href="https://simonwillison.net/2020/Jun/20/cookiecutter-plugins/"&gt;my datasette-plugin cookiecutter template&lt;/a&gt; earlier in the week; here's what else I've been working on.&lt;/p&gt;

&lt;h4 id="sqlite-generate"&gt;sqlite-generate&lt;/h4&gt;

&lt;p&gt;Datasette is supposed to work against any SQLite database you throw at it, no matter how weird the schema or how unwieldy the database shape or size.&lt;/p&gt;

&lt;p&gt;I built a new tool called &lt;a href="https://github.com/simonw/sqlite-generate"&gt;sqlite-generate&lt;/a&gt; this week to help me create databases of different shapes. It's a Python command-line tool which uses &lt;a href="https://faker.readthedocs.io/"&gt;Faker&lt;/a&gt; to populate a new database with random data. You run it something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sqlite-generate demo.db \
    --tables=20 \
    --rows=100,500 \
    --columns=5,20 \
    --fks=0,3 \
    --pks=0,2 \
    --fts&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This command creates a database containing 20 tables, each with between 100 and 500 rows and 5-20 columns. Each table will also have between 0 and 3 foreign key columns to other tables, and will feature between 0 and 2 primary key columns. SQLite full-text search will be configured against all of the text columns in the table.&lt;/p&gt;

&lt;p&gt;I always try to include a live demo with any of my projects, and &lt;code&gt;sqlite-generate&lt;/code&gt; is no exception. &lt;a href="https://github.com/simonw/sqlite-generate/blob/main/.github/workflows/demo.yml"&gt;This GitHub Action&lt;/a&gt; runs on every push to main and deploys a demo to &lt;a href="https://sqlite-generate-demo.datasette.io/"&gt;https://sqlite-generate-demo.datasette.io/&lt;/a&gt; showing the latest version of the code in action.&lt;/p&gt;

&lt;p&gt;The demo runs my &lt;a href="https://github.com/simonw/datasette-search-all"&gt;datasette-search-all&lt;/a&gt; plugin in order to more easily demonstrate full-text search across all of the text columns in the generated tables. Try searching for &lt;a href="https://sqlite-generate-demo.datasette.io/-/search?q=newspaper"&gt;newspaper&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id="click-app"&gt;click-app cookiecutter template&lt;/h4&gt;

&lt;p&gt;I write quite a lot of &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; powered command-line tools like this one, so inspired by &lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; I created a new &lt;a href="https://github.com/simonw/click-app"&gt;click-app&lt;/a&gt; cookiecutter template that bakes in my own preferences about how to set up a new Click project (complete with GitHub Actions). &lt;code&gt;sqlite-generate&lt;/code&gt; is the first tool I've built using that template.&lt;/p&gt;

&lt;h4 id="improved-plugin-docs"&gt;Improved Datasette plugin documentation&lt;/h4&gt;

&lt;p&gt;I've split Datasette's plugin documentation into five separate pages, and added a new page to the documentation about patterns for testing plugins.&lt;/p&gt;

&lt;p&gt;The five pages are:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="https://datasette.readthedocs.io/en/latest/plugins.html"&gt;Plugins&lt;/a&gt; describing how to install and configure plugins&lt;/li&gt;&lt;li&gt;&lt;a href="https://datasette.readthedocs.io/en/latest/writing_plugins.html"&gt;Writing plugins&lt;/a&gt; showing how to write one-off plugins, how to use the &lt;code&gt;datasette-plugin&lt;/code&gt; cookiecutter template and how to package templates for release to PyPI&lt;/li&gt;&lt;li&gt;&lt;a href="https://datasette.readthedocs.io/en/latest/plugin_hooks.html"&gt;Plugin hooks&lt;/a&gt; documenting all of the available plugin hooks&lt;/li&gt;&lt;li&gt;&lt;a href="https://datasette.readthedocs.io/en/latest/testing_plugins.html"&gt;Testing plugins&lt;/a&gt; describing my preferred patterns for writing tests for them (using &lt;a href="https://docs.pytest.org/"&gt;pytest&lt;/a&gt; and &lt;a href="https://www.python-httpx.org/"&gt;HTTPX&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;&lt;a href="https://datasette.readthedocs.io/en/latest/internals.html"&gt;Internals for plugins&lt;/a&gt; describing the APIs Datasette makes available for use within plugin hook implementations&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;There's also a &lt;a href="https://datasette.readthedocs.io/en/latest/ecosystem.html#datasette-plugins"&gt;list of available plugins&lt;/a&gt; on the Datasette Ecosystem page of the documentation, though I plan to move those to a separate plugin directory in the future.&lt;/p&gt;

&lt;h4 id="datasette-block-robots"&gt;datasette-block-robots&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; template practically eliminates the friction involved in starting a new plugin.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sqlite-generate&lt;/code&gt; generates random names for people. I don't particularly want people who search for their own names stumbling across the live demo and being weirded out by their name featured there, so I decided to block it from search engine crawlers using &lt;code&gt;robots.txt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I wrote a tiny plugin to do this: &lt;a href="https://github.com/simonw/datasette-block-robots"&gt;datasette-block-robots&lt;/a&gt;, which uses the new &lt;a href="https://datasette.readthedocs.io/en/latest/plugin_hooks.html#register-routes"&gt;register_routes() plugin hook&lt;/a&gt; to add a &lt;code&gt;/robots.txt&lt;/code&gt; page.&lt;/p&gt;

&lt;p&gt;It's also a neat example of the &lt;a href="https://github.com/simonw/datasette-block-robots/blob/main/datasette_block_robots/__init__.py"&gt;simplest possible plugin&lt;/a&gt; to use that feature - along with the &lt;a href="https://github.com/simonw/datasette-block-robots/blob/main/tests/test_block_robots.py"&gt;simplest possible unit test&lt;/a&gt; for exercising such a page.&lt;/p&gt;

&lt;h4 id="datasette-saved-queries"&gt;datasette-saved-queries&lt;/h4&gt;

&lt;p&gt;Another new plugin, this time with a bit more substance to it. &lt;a href="https://github.com/simonw/datasette-saved-queries"&gt;datasette-saved-queries&lt;/a&gt; exercises the new &lt;a href="https://datasette.readthedocs.io/en/latest/plugin_hooks.html#canned-queries-datasette-database-actor"&gt;canned_queries()&lt;/a&gt; hook I &lt;a href="https://simonwillison.net/2020/Jun/19/datasette-alphas/"&gt;described last week&lt;/a&gt;. It uses the new &lt;a href="https://datasette.readthedocs.io/en/latest/plugin_hooks.html#startup-datasette"&gt;startup()&lt;/a&gt; hook to create tables on startup (if they are missing), then lets users insert records into those tables to save their own queries. Queries saved in this way are then returned as canned queries for that particular database.&lt;/p&gt;

&lt;h4 id="main-not-master"&gt;main, not master&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;main&lt;/code&gt; is a better name for the main GitHub branch than &lt;code&gt;master&lt;/code&gt;, which has unpleasant connotations (it apparently derives from master/slave in BitKeeper). My &lt;code&gt;datasette-plugin&lt;/code&gt; and &lt;code&gt;click-app&lt;/code&gt; cookiecutter templates both include instructions for renaming &lt;code&gt;master&lt;/code&gt; to &lt;code&gt;main&lt;/code&gt; in their READMEs - it's as easy as running &lt;code&gt;git branch -m master main&lt;/code&gt; before running your first push to GitHub.&lt;/p&gt;

&lt;p&gt;I'm working towards &lt;a href="https://github.com/simonw/datasette/issues/849"&gt;making the switch&lt;/a&gt; for Datasette itself.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/git"&gt;git&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cookiecutter"&gt;cookiecutter&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="git"/><category term="plugins"/><category term="projects"/><category term="robots-txt"/><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/><category term="cookiecutter"/></entry><entry><title>datasette-block-robots</title><link href="https://simonwillison.net/2020/Jun/23/datasette-block-robots/#atom-tag" rel="alternate"/><published>2020-06-23T03:28:00+00:00</published><updated>2020-06-23T03:28:00+00:00</updated><id>https://simonwillison.net/2020/Jun/23/datasette-block-robots/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-block-robots"&gt;datasette-block-robots&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Another little Datasette plugin: this one adds a &lt;code&gt;/robots.txt&lt;/code&gt; page with &lt;code&gt;Disallow: /&lt;/code&gt; to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the &lt;a href="https://github.com/simonw/datasette-plugin"&gt;datasette-plugin&lt;/a&gt; cookiecutter template.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;



</summary><category term="crawling"/><category term="plugins"/><category term="projects"/><category term="robots-txt"/><category term="seo"/><category term="datasette"/></entry><entry><title>RFC5785: Defining Well-Known Uniform Resource Identifiers</title><link href="https://simonwillison.net/2010/Apr/11/rfc/#atom-tag" rel="alternate"/><published>2010-04-11T19:32:28+00:00</published><updated>2010-04-11T19:32:28+00:00</updated><id>https://simonwillison.net/2010/Apr/11/rfc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.rfc-editor.org/rfc/rfc5785.txt"&gt;RFC5785: Defining Well-Known Uniform Resource Identifiers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Sounds like a very good idea to me: defining a common prefix of /.well-known/ for well-known URLs (common metadata like robots.txt) and establishing a registry for all such files. OAuth, OpenID and other decentralised identity systems can all benefit from this.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://www.mnot.net/blog/2010/04/07/well-known"&gt;Mark Nottingham&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/oauth"&gt;oauth&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openid"&gt;openid&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rfc"&gt;rfc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/wellknownurls"&gt;wellknownurls&lt;/a&gt;&lt;/p&gt;



</summary><category term="oauth"/><category term="openid"/><category term="rfc"/><category term="robots-txt"/><category term="urls"/><category term="wellknownurls"/></entry><entry><title>The X-Robots-Tag HTTP header</title><link href="https://simonwillison.net/2008/Jun/9/official/#atom-tag" rel="alternate"/><published>2008-06-09T09:21:24+00:00</published><updated>2008-06-09T09:21:24+00:00</updated><id>https://simonwillison.net/2008/Jun/9/official/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html"&gt;The X-Robots-Tag HTTP header&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
News to me, but both Google and Yahoo! have supported it since last year. You can add per-page robots exclusion rules in HTTP headers instead of using meta tags, and Google’s version supports unavailable_after which is handy for content with a known limited shelf-life.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xrobotstag"&gt;xrobotstag&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/yahoo"&gt;yahoo&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="http"/><category term="robots-txt"/><category term="xrobotstag"/><category term="yahoo"/></entry><entry><title>robots.txt Adventure</title><link href="https://simonwillison.net/2007/Sep/22/nextthingorg/#atom-tag" rel="alternate"/><published>2007-09-22T00:36:17+00:00</published><updated>2007-09-22T00:36:17+00:00</updated><id>https://simonwillison.net/2007/Sep/22/nextthingorg/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.nextthing.org/archives/2007/03/12/robotstxt-adventure"&gt;robots.txt Adventure&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Interesting notes from crawling 4.6 million robots.txt, including 69 different ways in which the word “disallow” can be mis-spelled.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andrew-wooster"&gt;andrew-wooster&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/crawling"&gt;crawling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;&lt;/p&gt;



</summary><category term="andrew-wooster"/><category term="crawling"/><category term="robots-txt"/></entry><entry><title>New anti-comment-spam measure</title><link href="https://simonwillison.net/2003/Oct/13/linkRedirects/#atom-tag" rel="alternate"/><published>2003-10-13T08:22:09+00:00</published><updated>2003-10-13T08:22:09+00:00</updated><id>https://simonwillison.net/2003/Oct/13/linkRedirects/#atom-tag</id><summary type="html">
    &lt;p&gt;I've added a new anti-comment-spam measure to this site. The majority of comment spam exists for one reason and one reason only to increase the Google PageRank of the site linked from the spam and specifically to increase its ranking for the term used in the link. This is why so many comment spams include links like this: &lt;a href="http://jeremy.zawodny.com/blog/archives/001002.html"&gt;Cheap Viagra&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Cut off the PageRank boost and you cut off the advantage of spamming, simple as that. I've altered my comments system to redirect ALL outgoing links through a simple redirect script, and added that script to &lt;a href="/robots.txt"&gt;my robots.txt file&lt;/a&gt;. Links still work fine (even the referral information persists across the redirect) but Google will ignore them completely when calculating PageRank.&lt;/p&gt;

&lt;p&gt;Will this reduce the floods of comment spam my site receives? Probably not; I've added a note about the restriction to my 'add comment' form but I doubt many spammers bother to read much about the sites they are targetting. What's really needed is for this technique to become widespread by being integrated in to existing blogging tools - are you listening Moveable Type hackers?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Sencer has &lt;a href="http://www.sencer.de/index.php?p=81" title="Google Comment Spammers, Redirects and PR"&gt;pointed out&lt;/a&gt; in the comments that PageRank persists over redirects, and Google appears to ignore robots.txt when used to hide a redirecting page. I've updated my redirection script to use javascript to power the redirect (with a link for people with javascript disabled) and an extra meta tag to remind Google not to follow the link. This has the unfortunate side effect that referral information no longer persists across the redirect.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/spam"&gt;spam&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="robots-txt"/><category term="spam"/></entry><entry><title>How the RIAA was hacked</title><link href="https://simonwillison.net/2002/Sep/23/howTheRiaaWasHacked/#atom-tag" rel="alternate"/><published>2002-09-23T19:02:56+00:00</published><updated>2002-09-23T19:02:56+00:00</updated><id>https://simonwillison.net/2002/Sep/23/howTheRiaaWasHacked/#atom-tag</id><summary type="html">
    &lt;p&gt;The Register: &lt;a href="http://www.theregister.co.uk/content/6/27230.html"&gt;Want to know how RIAA.org was hacked?&lt;/a&gt; They had an un-password-protected admin panel listed in their &lt;code&gt;robots.txt&lt;/code&gt; file. Muppets.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="robots-txt"/><category term="security"/></entry><entry><title>Stupid Danish newspapers</title><link href="https://simonwillison.net/2002/Jul/5/stupidDanishNewspapers/#atom-tag" rel="alternate"/><published>2002-07-05T17:24:24+00:00</published><updated>2002-07-05T17:24:24+00:00</updated><id>https://simonwillison.net/2002/Jul/5/stupidDanishNewspapers/#atom-tag</id><summary type="html">
    &lt;p&gt;More deep linking stupidity (via &lt;a href="http://scriptingnews.userland.com/backissues/2002/07/05#When:9:03:22AM"&gt;Scripting News&lt;/a&gt;). A judge in Denmark has &lt;a href="http://www.newsbooster.com/?pg=lost&amp;amp;lan=eng"&gt;ruled in favour&lt;/a&gt; of a newspaper who took a search engine to court over "deep linking", despite the search engine's spider following the &lt;code&gt;robots.txt&lt;/code&gt; standard (it seems the newspaper didn't bother to implement a &lt;code&gt;robots.txt&lt;/code&gt; file). Dave Winer summed things up perfectly:&lt;/p&gt;
&lt;blockquote cite="http://www.newsbooster.com/?pg=lost&amp;amp;lan=eng"&gt;&lt;p&gt;BTW, deep linking is an oxymoron. There's only one kind of linking on the Web. Why would you ever point to the home page of a news oriented site.&lt;/p&gt;&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/dave-winer"&gt;dave-winer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/denmark"&gt;denmark&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/linking"&gt;linking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/robots-txt"&gt;robots-txt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stupid"&gt;stupid&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="dave-winer"/><category term="denmark"/><category term="linking"/><category term="robots-txt"/><category term="stupid"/></entry></feed>