Why you should be using disambiguated URLs
Good URLs are important. The best URLs are readable, reliable and hackable.
One of the nice things about Rails, Django and other modern Web frameworks is that most of them encourage smart URL design. Rails has relatively smart defaults and a powerful routing system for custom URLs; Django forces you to think about URL design up front by defining them as regular expressions. Many of the definitive “Web 2.0” sites such as Flickr and del.icio.us also use well designed URLs. This is a positive trend, and long may it continue.
There’s one aspect of URL design that is often ignored. Good URLs should be unambiguous. By that, I mean that any logical piece of content should have one and only one definitive URL, with any alternatives acting as a permanent redirect.
This rule is frequently broken. Here are some examples:
- My Flickr photo stream lives at www.flickr.com/photos/simon/ and flickr.com/photos/simon/
- My del.icio.us account is at del.icio.us/simonw and del.icio.us/simonw/
- The YDN Python Developer center lives at both http://developer.yahoo.com/python/ and http://developer.yahoo.com/python/index.html
- The Google AdSense sign-in page is at https://www.google.com/adsense/ and https://google.com/adsense/. If you visit the latter you get a scary certificate warning (they really need to fix that).
In each of the above cases, it’s obvious to regular people that the URLs are the same. Unfortunately, from a technical point of view they are different and could quite feasibly serve up different content. This causes all kinds of problems:
- Caches (both browser and intermediate proxies) can’t improve performance if you request the same content from a different URL.
- Browser can’t show users their visited links.
- Social link sharing sites such as del.icio.us can’t accurately aggregate links to the same resource.
That last one in particular should catch your attention if you care about effectively promoting your site. Here’s a random example, plucked from today’s del.icio.us popular. convinceme.net is a new online debating site (tag clouds, gradient fills, rounded corners). It’s listed in del.icio.us a total of four times!
- http://www.convinceme.net/ has 36 saves
- http://www.convinceme.net/index.php has 148 saves
- http://convinceme.net/ has 211 saves
- http://convinceme.net/index.php has 38 saves
Combined that’s 433 saves; much more impressive, and more likely to end up at the top of a social sharing sites.
Here’s a useful rule of thumb:
Links to (and within) your site should never, ever end with index.php / index.html / default.aspx / any default directory index filename.
The whole point those defaults is that you can link to the containing directory to see their content, resulting in a shorter and prettier URL. If you’re linking to them directly you’re missing out on a golden opportunity to disambiguate your URLs.
Disambiguating your URLs isn’t particularly difficult. The no-www site offers tips on having one domain name redirect to the other, and there are various mod_rewrite techniques for achieving the desired effect as well. If mod_rewrite makes your hair stand on end, remember that if you are using a server-side scripting language such as PHP you can implement rewriting logic in your application code by examining the
$_ENV['PATH_INFO'] variable or your platform’s equivalent.
Django sites get this behaviour for free, thanks to some default settings and Django’s CommonMiddleware. You can that in action on this weblog: try here, here and here.
More recent articles
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023