Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Tweaking Wikipedia

Does anyone know why Wikipedia displays a redirected page at the same URL rather than using a proper HTTP redirect? Case in point: Topics in human-computer interaction actually displays the content from List of human-computer interaction topics (that’s my next exam topic)—the same content appears at two different URLs. Yuck. Here’s a Greasemonkey script to fix it: wikipedia-redirect.user.js.

While I’m at it, Wikipedia’s search function is painfully slow. Here’s another user script which changes their search button to search the site using Google instead. It also swaps the positions of the “Search” and “Go” buttons, and makes “Search” the default action for the form: wikipedia-googlesearch.user.js.

This is Tweaking Wikipedia by Simon Willison, posted on 6th June 2005.

Tagged ,

View blog reactions

Next: LUG Radio Live

Previous: Staying social

15 comments

  1. I'm pretty sure the reason they put "Go" before search is that it's much, much faster, and because they figure you can google if you want search. ;)

    +1 on the Googl-fied wikipedia search, though.

    Jeremy Dunck - 6th June 2005 20:26 - #

  2. You're probably right - I'm just so used to typing the page I want in after the "wiki" bit in the URL (Wikipedia converts spaces to underscores for you with a proper redirect) that I never use the Go button. I'll pop a version that doesn't swap the buttons up in a bit.

    Simon Willison - 6th June 2005 22:16 - #

  3. "Go" is a fall-though, meaning that if there's a node with the exact name you typed in, it'll go there very very quickly. If not, it'll fall through to the search feature.

    So if you type in "killing joke" it'll take you to the Killing Joke page. If you type in "what's this for!?..." it'll search for that phrase (and probably show you the Killing Joke page as a result.

    "Go" knows how to upper-case the first letter of each word, but not subsequent letters. So if you had a node called "DNS Migration" and you typed in "dns migration" you would end up in Search, but if you had a node called "Domain Name Service" and you typed it all in lower case, you'd get there.

    HTH.

    Sam - 6th June 2005 23:09 - #

  4. The Wikipedia "redirected from" pages annoy me too, but more in an anal "I want to have my citations completely correct" way than a purist "one resource, one URL" stance...

    James Kew - 7th June 2005 00:33 - #

  5. I think the most frequent reason they end up with redirects is when a page gets moved or renamed. From:

    http://meta.wikimedia.org/wiki/Help:Redirect

    "When a page is renamed/moved with the Renaming (moving) a page function, a redirect is automatically created from the old to the new name, and also one for the corresponding talk page."

    Antony Sargent - 7th June 2005 01:29 - #

  6. Interesting. If this wasn't wikipedia, they would've been penalized by Google for duplicate content.

    soxiam - 7th June 2005 02:07 - #

  7. I don't know if you're aware of the shortcut bookmarks in Firefox?

    Check in the Quick Searches folder under your bookmarks and you'll find a shortcut to Wikipedia (among others). Check the properties of the bookmark for the keyword associated with wikipedia (I have en).

    Then you just type: <keyword> <subject> in the address bar. For me it is: en sweden if I need to lookup the subject Sweden. Nifty, huh?

    If the subject isn't found, the search is called automatically...

    Patrick - 7th June 2005 03:50 - #

  8. Could you please serve the .js files as Content-Type: text/plain instead of text/html? Thanks!

    Beat Bolli - 7th June 2005 12:42 - #

  9. Beat: how odd, I have absolutely no idea why they're doing that. If you look at the other files in the directory they all serve as text/plain just fine. Maybe it's the length of the filenames? I tried to fix it with a ForceType text/plain directive in a .htaccess file but that doesn't seem to have worked. I'll take another look at that tomorrow.

    Simon Willison - 7th June 2005 13:33 - #

  10. I'm afraid your script does three naughty things (one logical, two technical):

    Firstly, there's a very good reason redirects work in wikipedia the way they do. Say someone is writing a webpage somewhere about fictional Scotsmen. They want to mention The Simpsons' Groundskeeper Willie, so they link to Wikipedia's [[Groundskeeper Willie]] page. At the moment, for reasons of editorial exigency, that article redirects to the article [[Springfield Elementary School]] (with the "redirected from") subheader. But this this redirection (which is formally called a "subtopic redirect") is temporary: at some later juncture, someone will come along and write a full article about Groundskeeper Willie. This they'll put at the [[Groundskeeper Willie]] page, at the above URL. That's the correct, stable URL when referring to that subject. If someone goes to the [[Groundskeeper Willie]] article and gets the URL for [[Springfield Elementary School]] then that's what they'll use as their link, and that link will persist even after Willie get's farmed out to his own article. Granted, most redirects don't work this way - most are for alternate names or spelling and capitalisation variations. Mediawiki doesn't differentiate between the two kinds of redirects, so the two are (from both its and any userscript's perspective) technically indistinguishable. Some future version of Mediawiki (where proper article metadata is proposed) may differentiate the two adequately, meaning mediawiki could to the right thing (show the "canonical" URL for spelling variant redirects, while showing the "correct" URL for "merged-in" subjects like [[Groundskeeper Willie]]). In the meantime, Mediawiki does the right thing, albeit for rather subtle reasons.

    Secondly, your script breaks editing of redirects (this is an issue only for wikipedia editors). When someone wants to edit a redirect (say the Willie one above) they'd go to the page, end up at the Elementary School article, and then hit the link in the "redirected from" line. That takes them back to the #REDIRECT page, which they'd then edit (say adding the full Groundskeeper Willie article mentioned above). As your script effectively zaps the "redirected from" line, this isn't possible (bar manually editing the URL). Current Wikipedia editors won't be bothered by how redirects work (and so I doubt will install your script) but prospective ones will find redirect editing doesn't behave the way it is described in Wikipedia's documentation.

    And thirdly, it seems your script causes two pageloads when viewing a redirected page (one to the [[Springfield Elementary School]] redirected from [[Groundskeeper Willie]], and a second for [[Springfield Elementary School]] without the redirect). Wikipedia is a visitor-funded, volunteer-run project that's always seriously short of server capacity. Causing two page loads (and two page generation cycles) when one would do is wasting a scarce resource. And there are a lot of redirects in wikipedia. If you're so averted to the "redirected from" line, the right thing to do would be to have the greasemonkey script zap the corresponding DIV from the page post-load (although, as I say in my second point, I don't think that removing it, by any means, is at all a wise idea).

    Finlay McWalter - 8th June 2005 01:33 - #

  11. Wikipedia is a visitor-funded, volunteer-run project that's always seriously short of server capacity. Causing two page loads (and two page generation cycles) when one would do is wasting a scarce resource. And there are a lot of redirects in wikipedia.

    But the Wikipedia project is already wasting this scarce resource by not generating proper HTTP redirects. Every time it happens (and you admit it happens a lot), you drive down cache hits, causing higher load on the server and wasting bandwidth. A user script installed by a few people seems insignificant in comparison.

    The ugly URI isn't just an aesthetic problem, it's actively wasting bandwidth and causing higher server load.

    Mediawiki doesn't differentiate between the two kinds of redirects, so the two are (from both its and any userscript's perspective) technically indistinguishable.

    This is the root cause of the problem. Fix that and you'll save a lot more bandwidth than is being wasted by Simon's script.

    Jim - 17th June 2005 01:05 - #

  12. Speaking of abusive wikipedia tweaks...

    Jeremy Dunck - 17th June 2005 20:01 - #

  13. I filed a bug on this; it's really tough to do right considering all the constraints. First, the backlink needs to be there for Wikipedia editors. But it can't just look at the referer header because then it'd need to do a db hit to see if that page was a redirect. A compromise was to do:

    1. GET /foo (returns redirect)
    2. GET /redirect?old=foo&new=bar (returns redirect)
    3. GET /bar (looks at referer and links to foo)

    but I don't think anybody has implemented it.

    P.S. Why isn't the br tag allowed?

    Aaron Swartz - 24th July 2005 19:57 - #

  14. Is this blog powered by wordpress and if so, were can I get the theme? Thanks

    Holden - 9th August 2005 02:09 - #

  15. I want to get the script which will activate the search button to search for any entered string in wikipedia. How can i add a script in my web site which will search for a word in wikipedia and will diplay in my site.

    Deepayan - 10th November 2005 12:26 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2005/06/06/wikipedia

A django site