Simon Willison’s Weblog

Entries tagged urls

Filters: Type: entry × urls ×


Recovering missing content from the Internet Archive

When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.

[... 636 words]

What do Twitter and Gawker think of hash-bangs URLs?

As of December 2013 (and potentially much earlier, I don’t have the exact dates) both Twitter and a Gawker have moved away from hash bang URLs, so my guess is they turned out not to be a good idea.

[... 82 words]

How to find the URL of a page in an iframe?

You can’t, as this would be a security and privacy violation. Imagine an evil website which loads up Google in a full page iframe and then tracks what the unsuspecting user searches for and clicks on.

[... 125 words]

What is the most efficient way to lookup an object (e.g. a user) by only a string?

Yes—an index on a varchar column is exactly how you would implement this.

[... 38 words]

Is there an API that returns metadata for a given URL?

I suggest taking a look at http://embed.ly/—it can take a huge range of URLs and turn them in to JSON metadata. Here’s what it can do with a Wikipedia page: http://embed.ly/docs/explore/obj...—and here’s Google Maps URL (not as useful, but still some interesting metadata extracted) http://embed.ly/docs/explore/obj...

[... 69 words]

How did art.sy get a “.sy” url?

Here’s a generally useful tip: if you’re interested in learning more about ANY top level domain, visit the Wikipedia page for it—which will be http://en.wikipedia.org/wiki/.sy in this case (just add the domain, complete with its dot prefix, directly after en.wikipedia.org/wiki/ ).

[... 105 words]

Which sites have the best URL design?

GitHub’s URL design is fantastic—it’s a virtually flawless mapping of Git semantics to URL space. Their basic URL structure is excellent, but they also have a bunch of neat URL hacks going on. Here are a few of my favourites:

[... 97 words]

When referring to our web site in publications (or Twitter or Facebook), when is it important to provide the full URL—http://www.mywebsite.com and when should you provide just the mywebsite.com?

You have no control over how other publications refer to your site—if you’re lucky, they might spell it correctly and check the link works before publishing (but I wouldn’t bet on it). What you DO have control over is making sure you compensate for any mistakes they make.

[... 166 words]

How did slashes become the standard path separators for URLs?

I’m going to take an educated guess and say it’s because of unix file system conventions. Early web servers mapped the URL to a path on disk inside the document root—this is still how most static sites work today.

[... 57 words]

How do you find the new URL of a Tumblr that has moved?

One trick that might work is to look up the old tumble in the Google cache or on archive.org, then copy and paste a unique search phrase from that page and run a Google search for:

[... 72 words]

Is there a way of tracking shortened URLs with Twitter streaming API?

Think about it like this: the whole point of the Twitter streaming API is to get you the tweets as soon after they are posted as possible. If the API were to provide access to the lengthened URLs, it would have to delay emitting a Tweet on to the stream until a resolver had gone through each shortened URL in the tweet and checked to find what it redirects to. This would mean that the speed with which the streaming API could deal out tweets would be dependent on the speed of the third party servers that serve up the redirects. I doubt Twitter would ever want to implement this.

[... 159 words]

Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?

I’d say no.

[... 89 words]

Is it a good idea to allocate URLs such as quora.com/username to users?

There’s an interesting discussion about this issue on this question: How do sites prevent vanity URLs from colliding with future features ?

[... 42 words]

Is there any consensus yet on link rel=shorturl vs rev=canonical?

It’s pretty clear from the answers that rev=canonical v.s. rel=canonical is way too confusing—so it’s down to rel=shortlink v.s. rel=shorturl.

[... 38 words]

Why doesn’t Facebook use nicer URLs?

Just noticed this link: http://www.facebook.com/notes/fa...—so it looks like things are beginning to improve.

[... 28 words]

Why don’t more websites use alternative domains?

Because regular human beings don’t understand them, and expect everything to be a .com. Here’s an interesting post from 2007 on why Topix.net spent $1,000,000 buying the .com domain: http://www.skrenta.com/2007/03/k...

[... 45 words]

How do sites prevent vanity URLs from colliding with future features ?

For wildlifenearyou.com and djangopeople.net I used the same trick as described by others in this list—an enormous blacklist of everything I could possibly want to use for a future feature.

[... 157 words]

rev=canonical bookmarklet and designing shorter URLs

I’ve watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it’s very unlikely that all of these new services will still be around in twenty years time. Last month I suggested that the Internet Archive start mirroring redirect databases, and last week I was pleased to hear that Archiveteam, a different organisation, had already started crawling.

[... 920 words]

Why you should be using disambiguated URLs

Good URLs are important. The best URLs are readable, reliable and hackable.

[... 553 words]