Simon Willison’s Weblog

Why you should be using disambiguated URLs

Good URLs are important. The best URLs are readable, reliable and hackable.

One of the nice things about Rails, Django and other modern Web frameworks is that most of them encourage smart URL design. Rails has relatively smart defaults and a powerful routing system for custom URLs; Django forces you to think about URL design up front by defining them as regular expressions. Many of the definitive “Web 2.0” sites such as Flickr and del.icio.us also use well designed URLs. This is a positive trend, and long may it continue.

There’s one aspect of URL design that is often ignored. Good URLs should be unambiguous. By that, I mean that any logical piece of content should have one and only one definitive URL, with any alternatives acting as a permanent redirect.

This rule is frequently broken. Here are some examples:

  • My Flickr photo stream lives at www.flickr.com/photos/simon/ and flickr.com/photos/simon/
  • My del.icio.us account is at del.icio.us/simonw and del.icio.us/simonw/
  • The YDN Python Developer center lives at both http://developer.yahoo.com/python/ and http://developer.yahoo.com/python/index.html
  • The Google AdSense sign-in page is at https://www.google.com/adsense/ and https://google.com/adsense/. If you visit the latter you get a scary certificate warning (they really need to fix that).

In each of the above cases, it’s obvious to regular people that the URLs are the same. Unfortunately, from a technical point of view they are different and could quite feasibly serve up different content. This causes all kinds of problems:

  • Caches (both browser and intermediate proxies) can’t improve performance if you request the same content from a different URL.
  • Browser can’t show users their visited links.
  • Social link sharing sites such as del.icio.us can’t accurately aggregate links to the same resource.

That last one in particular should catch your attention if you care about effectively promoting your site. Here’s a random example, plucked from today’s del.icio.us popular. convinceme.net is a new online debating site (tag clouds, gradient fills, rounded corners). It’s listed in del.icio.us a total of four times!

  • http://www.convinceme.net/ has 36 saves
  • http://www.convinceme.net/index.php has 148 saves
  • http://convinceme.net/ has 211 saves
  • http://convinceme.net/index.php has 38 saves

Combined that’s 433 saves; much more impressive, and more likely to end up at the top of a social sharing sites.

Here’s a useful rule of thumb:

Links to (and within) your site should never, ever end with index.php / index.html / default.aspx / any default directory index filename.

The whole point those defaults is that you can link to the containing directory to see their content, resulting in a shorter and prettier URL. If you’re linking to them directly you’re missing out on a golden opportunity to disambiguate your URLs.

Disambiguating your URLs isn’t particularly difficult. The no-www site offers tips on having one domain name redirect to the other, and there are various mod_rewrite techniques for achieving the desired effect as well. If mod_rewrite makes your hair stand on end, remember that if you are using a server-side scripting language such as PHP you can implement rewriting logic in your application code by examining the $_ENV['PATH_INFO'] variable or your platform’s equivalent.

Django sites get this behaviour for free, thanks to some default settings and Django’s CommonMiddleware. You can that in action on this weblog: try here, here and here.

This is Why you should be using disambiguated URLs by Simon Willison, posted on 4th February 2007.

Tagged , , , ,

Next: Speaking at the Future of Web Apps

Previous: idproxy.net: Use your Yahoo! account as an OpenID