The dangers of PageRank
6th February 2004
A well documented side effect of the weblog format is that it brings Google PageRank in almost absurd quantities. I’m now the 5th result for simon on Google, and I’ve been the top result for simon willison almost since the day I launched. High rankings however are not always a good thing, especially when combined with a comment system. A growing number of bloggers have found themselves at the top position for terms of little or no relevance to the rest of their sites, which in turn can attract truly surreal comments from visitors from search engines who may never have encountered a blog before.
I know of a couple of entries on my own blog that are attracting this kind of traffic. The most interesting is probably this entry on artifical diamonds, which has attracted comments from both buyers and sellers of artificial gems. My entry on MSN messenger usability problems from 2002 has drawn a steady stream of hilarious comments, no doubt caused in part by its top rating on Google for msn messenger sucks. Amusingly, for a long time Microsoft’s own search engine was giving my page a high rank for a wide variety of less negative messenger related terms.
My own experiences of this phenomenon pale in to significance to some of the others I’ve seen. The most impressive example has to be Jason Kottke’s brief review of the Matrix Reloaded, which drew over 900 comments from Google strays, developed its own micro-community and resulted in Jason pondering who owns the conversation on my web site? Jason eventually deciding to close and archive the thread after the page grew to more than a megabyte in size.
The problem can take on a far more disturbing twist. I won’t link directly to these entries for fear of adding to their predicaments, but searches for crime scene cleanup and suicide chat rooms both return blogs in the first two results. The former thread is mostly crime scene cleanup companies marketing their services, but the latter is quite frankly disturbing. It’s certainly lead me to double check the titles of my entries before posting them.
Thankfully, avoiding this kind of unwanted comment traffic is pretty simple. One way is to simply disable comments for entries older than a certain time (generally a couple of weeks), although personally I like to see the occasional comment on old entries. A neater solution proposed by Russell Beattie last year is to simply hide comments from search engine referrals, thus ensuring that random strays won’t leave their mark without understanding the nature of your site first.
More recent articles
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023
- Now add a walrus: Prompt engineering in DALL-E 3 - 26th October 2023