Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Quality news site URLs

Nathan Ashby-Kuhlman is devoting this week on his blog to discussion and analysis of news site’s URLs.Nathan’s 5 attributes for a good URL are worth repeating here as they succintly describe my own opinions:

Readability
URLs should be human-readable
Brevity
URLs should be short
Cleanliness
URLs should not contain useless parts
Hierarchy
URLs should be ’hackable’ and dates should be year/month/day
Permanence
URLs should be permanently unique

I’m something of a fascist when it comes to URLs; they are one of my first considerations when initially designing a site and for personal projects I refuse to compromise on them. Nathan’s series is now on day two, and true to his word he has been dissecting a number of large and not-so-large news sites and grading each one on the quality of their article URLs.

Steven Jarvis plays devil’s advocate in the comments, asking a very interesting question about the value of “hackable” URLs:

I’ve got a devil’s advocate question for you: why do URLs need to be hackable? My wife (who is remarkably non-websavvy) would never in a thousand year think about hacking an URL. I’d say the same is true for at least 90% (and probably much higher than that) of the audience of news websites. *I* like hackable URLs, and I agree in general that they should be hierarchical, if only because I like at least the appearance (such as that given by liberal use of mod_rewrite) of a well-organized site. Isn’t hacking an URL really just a fall-back point when the site’s navigation fails you?

Nathan’s reply on the same page is worth reading. I agree with him; while hackable URLs as an additional navigation aid add very little value to the vast majority of surfers, they are a powerful indicator of the quality of a site’s navigational structure. They also tend to be a by-product of a well thought out URLs system, meaning if you are already following best practise for URLs you can often gain hackability for a very low extra cost.

This is Quality news site URLs by Simon Willison, posted on 30th July 2003.

View blog reactions

Next: Python 2.3

Previous: Superb CSS template/tutorial

11 comments

  1. An intuitive URL structure might encourage more people to play around with the URLs. I think that if more sites use clean URLs, more people will be modifying them directly. The chicken versus the egg?

    Jan! - 30th July 2003 14:43 - #

  2. The other thing people should think of is that clean URLs also provide search engines with a better way to index things. I've also noticed that Google seems to give the URL some amount of weight in terms of ranking. A clean url path like /blog/entry/entryname generally ranks higher than blog?entry=1231

    TNL

    Tristan Louis - 30th July 2003 14:56 - #

  3. Perhaps Google does; but that would probably be also partially down to the determination as to whether the page appeared to be auto-generated and clearly had characters like "?" appearing within the URI string etc.

    Robert Wellock - 30th July 2003 16:19 - #

  4. A URL is like a big "YOU ARE HERE" sign for each page of your site. It should allow people to get a sense of where they are in your site, even if they decide not to use that information for *navigation*.

    Keith - 30th July 2003 18:24 - #

  5. I like the trick with .htaccess that was discussed on evolt a while back. If you use PHP and Apache, it is a super-simple way of cleaning up URLS to look like directories.

    The technique involves making a php script to handle a particular type of request, say a search on articles, called search. No extension. Then you use a bit of .htaccess magic to force that file to be interpreted by PHP.

    Then in your script, examine the request URI variable ($_SERVER["REQUEST_URI"] in PHP), parse it, and call the appropriate code. I've used it to handle sectioned content, image galleries, and other fun stuff. What was http://dast.freeshell.org/?section=emacs/elisp became http://dast.freeshel.org/section/emacs/elisp. My next step is to clean up blog URLs and make permalinks with this technique.

    You can also avoid making a bunch of different scripts to handle this stuff by symlinking these scripts to one central script in a UNIX-like OS, anyway. Have fun.

    Dast - 30th July 2003 18:34 - #

  6. Very very nice tip Dast ! Gonna implent that on my site.. when I get around to it :) And an interesting post / article series as well.. I love hacking URL's :)

    Eivind Lie Nitter - 30th July 2003 23:24 - #

  7. Have you tried to add a single quote inside the URL?, in most of the cases you come up with database errors.

    Pepe Peinado - 31st July 2003 08:13 - #

  8. Word to the wise: I used to use the in script scheme mentioned above, but it makes it very easy for you to just keep adding stuff to your scripts until they become unruly. I found using mod_rewrite to point to different small files (one for each piece of functionality) much more manageable.

    Lach - 1st August 2003 01:58 - #

  9. Imo the best thing about good URLs is with auto-complete and history lists. I don't think most users will regularly try and hack URLs, but many a time I've typed a domain into a browser and it's come up with an auto-complete list such as:

    • http://domain.com/productlisting.asp?Id=5
    • http://domain.com/productlisting.asp?Id=45

    ...whereas if it was something sensible like:

    • http://domain.com/products/memory
    • http://domain.com/products/software

    ...then I'd be able to jump straight to the page. Real text in URLs is so much more useful than ID numbers.

    (PS: my blog's URLs aren't as good as I'd like them to be, so the above is slightly hypocritical. I'll work on it when I have some free time ;))

    Tom Gilder - 1st August 2003 02:07 - #

  10. I have a few questions regarding creating permanent URL's.

    • By placing documents in sections like "http://domain.com/products/memory" doesn't it restrict the document from being associated with another section or product in this case?
    • By placing documents in a date hierarchy "http://domain.com/archives/2003/08/10" doesn't it restrict the document to a particular date? What happens if the document is updated with new information? The URL does not then indicate that the content has some new fresh updates to it. It may be considered old information and may be harder to find if searching for documents by hacking the URL.

    Why not give each article/document/bog entry a unique title as the only identifying feature of the URL. Then it allows the site to change its hierarchy and taxonomy without affecting the URL. Documents can be re-classified or updated without creating any confusion by placing any structure or hierarchy in the URL.

    Take Google as an example. When I am looking for information I am never thinking of a particular website or the URL hierarchy that may be used for the document I am interested in. I type some keywords and select a document I am interested in.

    I agree the URL should be readable but it has no need for hierarchy in it. Having a hierarchy in the URL might give me some clues as to how the document was categorised at the time it was published but it may not be relevant anymore if the document has been reclassified or if the document has been updated or amended.

    Keep it simple. Give it a unique title which I think should be a string from 1-50 characters.

    I would love to hear any comments on whether I am on the right track here or not or some good reasons for creating hierarchies and how to manage these hierarchies in URLs.

    David Marsh - 11th August 2003 05:33 - #

  11. Yeah, is it important to make the link with the correct and complet URL, I mean .../index.htm

    Matthias - 16th October 2003 14:07 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/07/30/newsSiteURLs

A django site