Neat tip for clean URLs
Here’s one of the neatest tips for clean URLs I’ve seen yet, from Thijs van der Vossen. He’s come up with a mod_rewrite rule that checks to see if the requested file exists if you add .html on to the end of it, and serves it up if that’s the case. I’m posting the full code snippet here because it’s just too good to risk losing to link-rot in the distant future:
RewriteEngine on RewriteBase / RewriteCond %{REQUEST_FILENAME}.html -f RewriteRule (.*) $1\.html [L]
It's better to map it the other way and add an [R] on the end. That way the canonical URL is the neater one, and requests for the .html version are redirected - which means only one HTTP object, and thus far better cachability. Better yet, serve a 301 Moved Permanently so that software can pick up the change to point to the new URL.
For what it's worth, I prefer to use a .content extension for things that are presented to the end-user, and an .action extension for things that just process data (along with .style, .script and so on).
Jim Dabell - 6th August 2003 20:37 - #
Thijs van der Vossen - 6th August 2003 22:22 - #
Matt - 6th August 2003 22:41 - #
.../2003/07/clean_url.htmlfrom the server filesystem when a request for the urlhttp://www.vandervossen.net/2003/07/clean_urlis made. Adding a [R] to the last rule would redirect clients requestinghttp://www.vandervossen.net/2003/07/clean_urltohttp://www.vandervossen.net/2003/07/clean_url.html, but that's exactly what I don't want to do.Thijs van der Vossen - 6th August 2003 22:58 - #
RewriteEngine On
# / or index.html is requested so call delegator.php with a special parameter
RewriteRule ^$ delegator.php?url=index/&host=%{HTTP_HOST}&from=%{ REQUEST_URI}index/ [QSA,L]
RewriteRule ^index\.html$ delegator.php?url=index/&host=%{HTTP_HOST}&from=%{ REQUEST_URI} [QSA,L]
# then if we have a request which is neither file nor dir send the request to delegator.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) delegator.php?url=$1&host=%{HTTP_HOST}&from=%{REQU EST_URI} [QSA,L]
This catches anything which is not a file, if you put it into a directory (within .htaccess of course). Please see discussion on sitepointforums for more details. http://www.sitepointforums.com/showthread.php?thre adid=117103
PS: Simon, the textarea for posting is way too small and no PRE tag is allowed :(
Sasa Velickovic - 6th August 2003 23:05 - #
There are a few issues here. Firstly, there's the thought that a .html extension is not portable when you want to switch to PHP or something. This is not true. File extensions mean nothing in terms of HTTP. You can configure a server to run .html files through PHP easily.
So now it comes down to aesthetics. A lot of people prefer URLs without extensions as they are "cleaner". That's understandable, I moved to the scheme I described above for similar reasons.
This does not mean you should just serve up the object whenever you think you can find a corresponding file though. When you serve a file through two different addresses, you are actually serving two completely separate objects in terms of HTTP. /foo and /foo.html may be the same file on your server, but they are completely different documents to a web browser, a proxy, or any HTTP client.
The most immediate effect of this is to interfere with caching for no good reason. This bogs down your server with requests and file transfers that are completely unnecessary, and can be very annoying for people with intermittent access to the web, for instance, people who use WWWoffle.
What the documents you refer to advocate, and which I do as well, is designing a good, stable scheme for URLs, and implementing a sensible way of moving to that scheme.
So if you have already published under the .html extensions, you need to construct 301 notices to inform people of their new location, rather than having two separate objects. I don't know if you can do 301s through mod_rewrite or not, but [R] is a decent substitute, certainly a lot better than a duplicate object.
When I referred to mapping it the other way, perhaps I wasn't being clear. What I meant was that in the face of /foo.html, you should provide a 301 notice pointing to /foo. You can then use whatever server mechanism you choose (such as mod_redirect) to serve it.
Multiviews is a different animal, and isn't really designed for this. You can run into trouble if you have different types of files in the same directory. It also reduces the effectiveness of caching, by serving individual objects to each unique Accept header that the server sees.
Jim Dabell - 7th August 2003 00:28 - #
Jim, I agree it is not so nice to have the same file available on two different adresses.
I did not write a rule to redirect
foo.htmltofoobecause I did not yet have any inbound links -- or any visitors -- at the time I implemented the rewrite rule.The best thing to do IMHO is not to redirect, but to respond with a 404-Not Found when someone requests something with a .html extension.
Thijs van der Vossen - 7th August 2003 08:37 - #
Absolutely, if you have not published anything under the .html URL. Of course, it's different if people have bookmarked the page, or linked to it or anything. No sense in breaking things when you can clearly signal the move.
Jim Dabell - 7th August 2003 12:50 - #
Keith - 7th August 2003 14:41 - #
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . dispatcher.php
Keith - 7th August 2003 14:45 - #
Thorn - 7th August 2003 14:58 - #
Thijs van der Vossen - 7th August 2003 15:18 - #
I don't know if it's possible with Apache to have the mod_rewrite also apply to the standard HTTP error pages. If so, you might want to take look at these error pages as the culprit for the redirecting behaviour.
I cannot verify if it eventually times out (our own proxy server times out before that) but when it doesn't, I might be on to something ;-)
Thorn - 7th August 2003 16:04 - #
Thijs van der Vossen - 7th August 2003 17:01 - #
It is an internal redirection problem. This is what I got in the rewrite logs:
217.19.22.138 - - [07/Aug/2003:18:33:09 +0200] [www.vandervossen.net/sid#81256f4][rid#830119c/ini tial/redir#69] (3) [per-dir /var/www/vandervossen/htdocs/] add path-info postfix: /var/www/vandervossen/htdocs/2003/08/meesterlijk -> /var/www/vandervossen/htdocs/2003/08/meesterlijk/. html.html.html.html.html.htmlI don't know why the rewrite module thinks
%{REQUEST_FILENAME}.htmlis a file, but addingRewriteCond %{REQUEST_URI} !/$appears to fix this problem.Thijs van der Vossen - 7th August 2003 17:45 - #
I've implemented this solution on my site, but have two problems:
Wayne Burkett - 11th August 2003 07:40 - #
I've been playing around with this technique, and I've run into a wall. Apparently, when you type the following address:
www.example.com/usability/sample_document... when "usability.shtml" exists, there is no "usability" directory, and "sample_document" would otherwise return a 404 error, the
%{REQUEST_FILENAME}variable contains the full path to "usability" -- which when the%{REQUEST_FILENAME}.shtml -fcondition is tested, returns true instead of false. This sends the mod_rewrite into an infinite loop. It never reaches a 404 error!Any suggestions anyone? Has anyone else run into this problem? Thanks for any help! I wasn't sure where else to post this specific problem. :)
Ben Clark - 6th April 2004 21:09 - #
Penis Enlargement - 13th September 2006 13:33 - #