Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Google approved PageRank stripping

Blogger are now using the redirect-without-PageRank technique to protect their hosted blogs against comment spam (also used by Moveable Type). At the risk of sounding incredibly pleased with myself (which I am), that was my idea! Sweet.

Here’s the real kicker: the URL redirector they are using is hosted on Google’s primary domain. This is great news for people like myself who are running their own redirector, as the problem with having a redirect on your site is that it can be abused to make it look like people are visiting a specific site from a link on your domain: or even worse, to trick people in to visiting an unpleasant link (see Slashdot comments, where links are displayed along side the domain on which the “real” site is hosted). Now I can point my own PageRank stripper at http://www.google.com/url?sa=D&q=URL and let Google handle the redirects for me. Lovely.

Peter van Dijck recently joined the ranks of victims of wiki spam. Let’s start rolling this technique out on Wikis as well. The trade-off in lost PageRank for linked sites is more than worth it.

This is Google approved PageRank stripping by Simon Willison, posted on 11th May 2004.

View blog reactions

Next: W3C Internationalisation Guidelines

Previous: Switch statements in Python

31 comments

  1. Wow! Congrats on that. If this technique is widely adopted, however, doesn't this ruin the effectiveness of links that Google's own Pagerank technology relies on? If you look at the Google Blog all their links are going through the redirects. Any compliant spider will not follow those links. This is about more than just Pagerank, this is potentially cutting off parts of the web from being indexed.

    Matt - 11th May 2004 08:39 - #

  2. Neato; isn't it validating to see your ideas adopted by very internet saavy folk?

    Matt, isn't cutting weblogs out of the PageRank equation something that many people were asking for when Googlebombing became the rage about a year ago? It makes me wonder if Google will ever drop the redirects at all.

    In the meantime, though, we can have a little fun with them. For example, chain the redirects to jump from site to site. Or visit Google's redirect script without a target link to generate an infite loop of redirection. I can generating about 6-8k/s of traffic by letting it run continuously.

    Micah - 11th May 2004 11:48 - #

  3. Wow, an open URL redirector - you would think Google would be smarter than that. The spammers will start using it to get past URL-blocking schemes soon like they do with Yahoo's.

    By the way, I just have to vent two bits of frustration with the popularity of this system (with apologies to Simon):

    • Some of them (notably, not Simon's or Google's) make it impossible for me to hover over a poster's name to see what their URL is.
    • They tend to break my Back button. (I swear this one used to, but it doesn't happen today. Perhaps I'm going insane.)

    Michael Moncur - 11th May 2004 11:57 - #

  4. First off Simon, congrats on adoption of the idea. It's always nice to see your ideas implemented, especially by a company like Google.

    Unfortunately, I wish it hadn't. At least not in its current implementation which has two major drawbacks:

    1. The destination is obscured. When I mouse over an author link, I expect to see where I'm going. This breaks usability just as badly as overriding the status bar. I know that this also annoys you.
    2. It obscures the referrer.

    Now, these problems aren't intrinsic to the technique, only the implementation.

    The first problem can be solved by using the URL as the link text like I do on my site or, in an ironic twist, using Javascript to overwrite the status message with the real destination. This last idea if I remember caused a big of controversy on this very site back when you started doing redirects.

    The second problem is also solvable with a little ingenuity. As we talked about in Austin, the redirect script could check for the incoming referrer and handle incoming and outgoing redirects differently. If the referrer is the site with the redirect link (e.g. this a comment on this site), it would be redirected out to the destination URL. If the referrer is not local (e.g. from referrer stats on someone else's site) it could redirect to the source. This could be done either by packing both source and destination address in the URL or by maintaining the link between the two in a local database.

    In any case, it's an interesting solution. I just wish that current implementations of it didn't suck so much.

    Jay Allen - 11th May 2004 14:04 - #

  5. I worry about scripts. It's trivial to write a script to spider the web for wikis and post links on them. Simon, you wrote (commenting on my blog) "requiring non-registered users to enter a capcha (shudder)".

    Why don't you like that? A capcha is an image with a few letters they have to enter, right? This does stop scripts, right? Seems to me that, if we can stop scripts, we're partly on our way to stopping the problem getting out of hand.

    If a user has to enter the code on an image as well as their name and email, that seems like a fine solution, usability wise?

    Peter - 11th May 2004 14:39 - #

  6. While I don't know Simon's specific reason for disliking capchas, I can relate my reasons for disliking them. There are two reasons:

    1. I am color-blind
    2. The image is a static size

    Now to explain. Many capchas I've seen use some bizarre color scheme (possibly to make it difficult for someone to write a 'bot that could deduce the capcha if it were simply black on white), unfortunately these color schemes often render me unable to read some of letters and numbers used, effectively locking me from using the site.

    The second reason, the image is a static size. My display resolution is 1280 x 1024. Many capchas are tiny, too tiny to read at my resolution. To read them I often need to load them in a graphic viewer and magnify, which at times makes them fuzzy and difficult to read. At least one site I visited, their capcha was on some type of timer. By the time I loaded it into a viewer, magnified and deciphered the letters, its time had expired and a new one generated. Needless to say I left that site immediately.

    Ken Power - 11th May 2004 15:14 - #

  7. While a good measure, I don't think that redirects are the only thing Google is doing (what I do think about G is posted here).

    I wonder how much spamming is working on other engines, like Yahoo! for instance, and is it worth spamming for Yahoo!?

    Mike P. - 11th May 2004 15:41 - #

  8. A generic challenge-response using text would be better than a capchas, surely? Asking, for example, the user to enter the third, last and third to last letters of the word "narcissus" - it'd be hard for someone to write a script to deal with stuff like that, and probably not worth the effort.

    This strikes me as more sensible than captcha images, anyway, though possibly easier to work around.

    Andrew Sidwell - 11th May 2004 17:14 - #

  9. i a��m surprised that google has not defined a profile that would allow specifying links like <a href="..." rel="untrusted">...</a> that the pagerank algorithm could then know to ignore when determining the rank. google certainly has the power to define some defacto standards --�� it would be nice if they used that power to define some simple but useful ones.

    jim winstead - 11th May 2004 17:29 - #

  10. The use of the relation attribute is a good idea, however it wouldn't work with other technologies like XFN. Also there are many search engines besides google, and asking them all (both present and future) to implement this standard would be tough.

    As an off topic side note about the XFN relation thing, I wonder why no-one has bothered to create a namespace syntax for the rel attribute. Something as simple as rel="xfn{friend co-worker} robots{untrusted}" would work. I mean, this is supposed to be an Extensible Markup Language, right?

    Stephen - 11th May 2004 18:17 - #

  11. The Moztips wiki uses an ascii art captcha system, generating a string of letters in different ascii-art fonts for the user to identify. It addresses both points that Ken brings up in being both monochrome and large in size.

    Micah - 11th May 2004 19:29 - #

  12. The rel attribute was the first thing I thought of as well. I don't see why it is incompatible with XFN though.

    As for Jay's complaint that the destination is obscured, would this be a legitimate use for the window.status property?

    Jim Dabell - 11th May 2004 20:15 - #

  13. Just another example of Google trying to diminish the blog and remove it from the grand schemeof the web. They seem to always dislike blogs, and want to isolate them into a different category than other websites -- and this is an example. How many web pages are hosted on blog systems, but are more like webpages than blogs? Plenty. This change will kill their earned page rank. Block comment spamming yes, but not all hrefs.

    Jim Jones - 12th May 2004 03:53 - #

  14. Jim Jones:
    Diminish the blog? The issue is that non comment links from within a blogger weblog have super google juice because they come from the blogger.com domain. This is a free service. If it worries anyone get hosted elsewhere.... I don't see in any way how this illustrates that they "dislike blogs"

    Darryl - 12th May 2004 05:01 - #

  15. Simon, just wondering if you could mask the redirect URL with the real URL with javascript using the same technique I used here? Make it easier to see what you might be clicking to.

    Paul - 12th May 2004 11:46 - #

  16. My only concern for the Google URL redirect is that they also use it on their search results page, something that I have also realised Yahoo do. A lot of stat packages (like AWStats) use the search engine href referrer to record the search query strings that people used. With the new URL redirect set up, you know visiters came from Google (or Yahoo) but not what they searched for when your site came up!

    This might seem just a petty thing, but surely Google/Yahoo etc can work out they don't need any URL redirect on their own search results.

    Dave Page - 12th May 2004 19:42 - #

  17. Well, one way to possibly add the referrer, is you could append it to the Google URL (in your own site code). Replace all URLs with: http://www.google.com/url?sa=D&q=[URL]&referrer=<? php $_SERVER['PHP_SELF'];?> or something like that. That would obscure the target even more, though.

    An idea for a sort of capchas. I forgot where I saw this, it was really clever. You have to type in the word above, and you put random letters and numbers between the word, and you change it to same color as the background color (setting the color to transparent doesn't work). There are three problems, though:

    • it would probably be easy to write a code to strip the bad letters out.
    • wouldn't work if you have a background image. Solution: make a solid color box around the word. it might mess up the design, though, depending on the background image.
    • people without CSS would see the characters in the middle. Solution: You could say "type in the word in bold", and put the word in strong tags, since most browsers render them as bold regardless of the stylesheet. But that would make it even easier to write a script.

    Typing the last few letters of a word seems like a good idea, too. Maybe a combination of this idea and the one I mentioned earlier.

    Tom - 13th May 2004 05:19 - #

  18. w00h00! Let's break the web!

    Hendrik Mans - 16th May 2004 17:23 - #

  19. I understand the desire to avoid blog spamming, but doesn't this somehow ruin the whole idea of backlinks? Now, some backlinks will count, and some won't. I get the sinking feeling that this redirect scheme will somehow get abused.

    Brian T - 16th May 2004 22:34 - #

  20. I'm wondering if there could be any negative consequences of a non-Blogger blog utilizing Google's redirect script for their comment URLs. I've considered implementing this myself (not being able to settle on a satisfactory redirect script) but have been hesitant not knowing if Google is monitoring redirect traffic originating from non-Blogger sites.

    andrew - 20th May 2004 23:09 - #

  21. Here are the slight changes you need to make to MoinMoin wiki to use this: RedirectingExternalLinks.

    DougHolton - 7th June 2004 00:07 - #

  22. Yes but if a spammer use directly the url in his post like : http://oupouaout.org or oupouaout Can we rewritte it ? because i habe a blog too, and many people spam !!! Kendyan

    ken - 25th June 2004 22:55 - #

  23. Simon, you need to escape the url parameter in the rewrite. Some of your current comment links which have query-strings are broken because of this.

    Ben Davenport - 14th July 2004 01:33 - #

  24. Stephen, implementing something like this wouldn't negatively impact on XFN rel attributes in the same link. Any spider specifically looking for XFN links should just ignore rel attributes that aren't part of the XFN specification.

    Phil McCluskey - 26th August 2004 00:41 - #

  25. just a question.. what does the D mean? url?sa=D i've seen too: url?sa=Q thanx.. andufo

    andufo - 10th September 2004 03:02 - #

  26. Using the above Google redirect will kill the Referer. However, if you drop the sa=D parameter it will not (but will it sill kill PageRank)?. I investigated the redirection a bit, please read http://www.fr135.de/space/Redirect+and+Citation+In dex

    Christian Fries - 19th December 2004 15:45 - #

  27. Simon, you might want to take another look at how your redirects are working out: Google is currently returning a rather odd "200 OK" along with the "Location:" to redirect, and some time since you started using them they've added /url to their robots.txt. Net result: an ill-behaved spider might not realize there was any redirect that should have stripped anything, or might believe that you linked to something it retrieved from Google itself, while a well-behaved spider will think that a page on a site it respects linked to a page on google.com that it's not allowed to look at, but which must contain something useful.

    Search for the names of your commenters on MSN Search, and you'll see your redirect URL featuring rather prominently, and I believe that (short of hand-tuning their algorithm) that's actually a correct search result, though not one that you want to encourage.

    Phil Ringnalda - 18th February 2006 04:03 - #

  28. D'oh, been too long and I forgot that that's exactly how it's supposed to work, but it depends on every search engine knowing that while some URLs hidden behind robots.txt are good results that they can't spider, but should still show based on the recommendations of linking pages, other URLs hidden behind robots.txt are things that the linking page wants to disavow.

    Hard for me to believe that I would actually say this of it, but even nofollow strikes me as less likely to break the web.

    Phil Ringnalda - 18th February 2006 04:21 - #

  29. A solution for the problem about invalidating referers and links when using Copy Shortcut and Add to Favorites. The article is based on Simon's article.
    Link: PageRank stripping without invalidating links

    Best regards,
    Lasse

    Lasse Bunk - 14th April 2006 15:38 - #

  30. sooner or later google will close the gates for 301 PR redirect be sure of that ;)

    Lorayne - 15th April 2006 06:58 - #

  31. Now that we have rel="nofollow" this is useless, no? And this redirect service had XSS problems, I am currently testing yet an other one though it's not really practical... More soon...

    h - 6th October 2006 02:32 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2004/05/11/approved

A django site