Simon Willison's Weblog: urls

New Django {% querystring %} template tag

2024-08-13T18:03:49+00:00

New Django {% querystring %} template tag

Django 5.1 came out last week and includes a neat new template tag which solves a problem I've faced a bunch of times in the past.

{% querystring color="red" size="S" %}

Adds ?color=red&size=S to the current URL - keeping any other existing parameters and replacing the current value for color or size if it's already set.

{% querystring color=None %}

Removes the ?color= parameter if it is currently set.

If the value passed is a list it will append ?color=red&color=blue for as many items as exist in the list.

You can access values in variables and you can also assign the result to a new template variable rather than outputting it directly to the page:

{% querystring page=page.next_page_number as next_page %}

Other things that caught my eye in Django 5.1:

PostgreSQL connection pools.
The new LoginRequiredMiddleware for making every page in an application require login.
The SQLite database backend now accepts init_command for settings things like PRAGMA cache_size=2000 on new connections.
SQLite can also be passed "transaction_mode": "IMMEDIATE" to configure the behaviour of transactions.

Tags: urls, sqlite, postgresql, django

trurl manipulates URLs

2023-04-04T22:08:13+00:00

trurl manipulates URLs

Brand new command-line tool from curl creator Daniel Stenberg: The tr stands for translate or transpose, and the tool provides various mechanisms for normalizing URLs, adding query strings, changing the path or hostname and other similar modifications. I’ve tried designing APis for this kind of thing in the past—Datasette includes some clumsily named functions such as path_with_removed_args()—and it’s a deceptively deep set of problems.
.

Tags: urls, curl, daniel-stenberg

Why I invented "dash encoding", a new encoding scheme for URL paths

2022-03-05T21:50:38+00:00

Datasette now includes its own custom string encoding scheme, which I've called dash encoding. I really didn't want to have to invent something new here, but unfortunately I think this is the best solution to my very particular problem. Some notes on how dash encoding works and why I created it.

Update 18th March 2022: This turned out not to be the right idea for my project after all! I ended up settling on a Tilde encoding scheme instead.

Table names and rows in URLs

I've put a lot of thought into the design of Datasette's URLs.

Datasette exposes relational databases tables, as both web pages and a JSON API.

Consider a database in a SQLite file called legislators.db, containing a table called legislator_terms (example from this tutorial). The URL path to the web interface for that table will be:

/legislators/legislator_terms

And the JSON API will be here:

/legislators/legislator_terms.json

(Worth noting that Datasette supports other formats here too - CSV by default, and plugins can add more formats such as GeoJSON or Atom or iCal.)

Datasette also provides pages (and APIs) for individual rows, identified by their primary key:

For tables with compound primary keys, these pages can include the primary key values separated by commas:

/fixtures/compound_three_primary_keys/a,a,a

This is all pretty straightforward so far. But now we get to the challenge: what if a table's name or a row's primary key contains a forward slash or a period character?

This could break the URL scheme!

SQLite table names are allowed to contain almost any character, and Datasette is designed to work with any existing SQLite database - so I can't guarantee that a table with one of those characters won't need to be handled.

Consider a database with two tables - one called legislator_terms and another called legislator_terms/1 - given the URL /legislators/legislator_terms/1 it's no longer clear if it refers to the table with that name or the row with primary key 1 in the other table!

A similar problem exists for table names with as legislators.csv - which end in a format. Or primary key string values that end in .json.

Why URL encoding doesn't work here

Up until now, Datasette has solved this problem using URL percent encoding. This provides a standard mechanism for encoding "special" characters in URLs.

legislator_terms/1 encodes to legislator_terms%2F1

This should be enough to solve the problem. The URL to that weirdly named table can now be:

/legislators/legislator_terms%2F1

When routing the URL, the application can take this into account and identify that this it a table named legislator_terms/1, as opposed to a request for the row with ID 1 in the legislator_terms table.

There are two remaining problems.

Firstly, the "." character is ignored by URL encoding, so we still can't tell the difference between /db/table.json and a table called table.json. I worked around this issue in Datasette by supporting an optional alternative ?_format=json parameter, but it's messy and confusing.

Much more seriously, it turns out there are numerous common pieces of web infrastructure that "helpfully" decode escaped characters in URLs before passing them on to the underlying web application!

I first encountered this in the ASGI standard itself, which decoded characters in the path field before they were passed to the rest of the application.I submitted a PR adding raw_path to ASGI precisely to work around this problem for Datasette.

Over time though, the problem kept cropping up. Datasette aims to run on as many hosting platforms as possible. I've seen URL escaping applied at a higher level enough times now to be very suspicious of any load balancer or proxy or other web server mechanism that might end up executing between Datasette and the rest of the web.

Update: Flask core maintainer David Lord confirms on Twitter that this is a long-standing known problem:

This behavior in Apache/nginx/etc is why WSGI/ASGI can't specify "literal URL the user typed in", because anything in front of the app might modify slashes or anything else. So all the spec can provide is "decoded URL".

So, I need a way of encoding a table name that might include / and . characters in a way that will survive some other layer of the stack decoding URL encoded strings in the URL path before Datasette gets to see them!

Introducing dash encoding

That's where dash encoding comes in. I tried to design the fastest, simplest encoding mechanism I could that would solve this very specific problem.

Loose requirements:

Reversible - it's crucial to at any possible value survives a round-trip through the encoding
Avoid changing the string at all if possible. Otherwise I could use something like base64, but I wanted to keep the name in the URL as close to readable as possible
Survive interference by proxies and load balancer that might try to be helpful
Fast to apply the transformation
As simple as possible
Easy to implement, including in languages other than Python

Dash encoding consists of three simple steps:

Replace all single hyphen characters - with two hyphens --
Replace any forward slash / character with hyphen forward slash -/
Replace any period character . with hyphen period -.

To reverse the encoding, run those steps backwards.

Here the Python implementation of this encoding scheme:

def dash_encode(s: str) -> str:
     "Returns dash-encoded string - for example ``/foo/bar`` -> ``-/foo-/bar``"
     return s.replace("-", "--").replace(".", "-.").replace("/", "-/")

def dash_decode(s: str) -> str:
     "Decodes a dash-encoded string, so ``-/foo-/bar`` -> ``/foo/bar``"
     return s.replace("-/", "/").replace("-.", ".").replace("--", "-")

And the pytest tests for it:

@pytest.mark.parametrize(
     "original,expected",
     (
         ("abc", "abc"),
         ("/foo/bar", "-/foo-/bar"),
         ("/-/bar", "-/---/bar"),
         ("-/db-/table---.csv-.csv", "---/db---/table-------.csv---.csv"),
     ),
 )
 def test_dash_encoding(original, expected):
     actual = utils.dash_encode(original)
     assert actual == expected
     # And test round-trip
     assert original == utils.dash_decode(actual)

Here's the full commit.

This meets my requirements.

Capturing these with a regular expression

There was one remaining challenge. Datasette uses regular expressions - inspired by Django - to route requests to the correct page.

I wanted to use a regular expression to extract out dash encoded values, that could also distinguish them from / and - and . characters that were not encoded in that way.

Here's the pattern I came up with for strings matching this pattern:

([^\/\-\.]*|(\-/)|(\-\.)|(\-\-))*

Broken down:

[^\/\-\.]* means 0 or more characters that are NOT one of . or / or - - since we don't care about those characters at all
(\-/) means the explicit sequence -/
(\-\.) means the explicit sequence -.
(\-\-) means the explicit sequence --
Those four are wrapped in a group combined with the | or operator
The group is then wrapped in a (..)* - specifying that it can repeat as many times as you like

A better way to break down this regular expression is visually, using Debuggex:

Combining this into the full regular expression that matches a /database/table.format path is even messier, due to the need to add non-capturing group syntax (?:..) and named groups (?P<name>...) - it ends up looking like this:

^/(?P<database>[^/]+)/(?P<table>(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)\.(?P<format>\w+)?$

Visualized with Debuggex:

Update: Thanks to suggestions from Matthew Somerville I simplified this further to:

^/(?P<database>[^/]+)/(?P<table>[^\/\-\.]*|\-/|\-\.|\-\-)*(?P<format>\.\w+)?$

Next steps: implementation

I'm currently working on integrating it into Datasette in this PR. The full history of my thinking around this problem can be found in issue 1439, with comments stretching back to August last year!

Tags: regular-expressions, urls, datasette

Datasette 0.51 (plus weeknotes)

2020-11-01T04:22:55+00:00

I shipped Datasette 0.51 today, with a new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. It's a lot of stuff! Here are the annotated release notes.

New visual design

Datasette is no longer white and grey with blue and purple links! Natalie Downe has been working on a visual refresh, the first iteration of which is included in this release. (#1056)

It's about time Datasette grew beyond its clearly-designed-by-a-mostly-backend-engineer roots. Natalie has been helping me start adding some visual polish: we've started with an update to the colour scheme and will be continuing to iterate on the visual design as the project evolves towards the 1.0 release.

The new design makes the navigation bar much more obvious, which is important for this release since the new navigation menu (tucked away behind a three-bar icon) is a key new feature.

Plugins can now add links within Datasette

A number of existing Datasette plugins add new pages to the Datasette interface, providig tools for things like uploading CSVs, editing table schemas or configuring full-text search.

Plugins like this can now link to themselves from other parts of Datasette interface. The menu_links(datasette, actor) hook (#1064) lets plugins add links to Datasette's new top-right application menu, and the table_actions(datasette, actor, database, table) hook (#1066) adds links to a new "table actions" menu on the table page.

This feature has been a long time coming. I've been writing an increasing number of plugins that add new pages to Datasette, and so far the main way of using them has been to memorise and type in their URLs!

The new navigation menu (which only displays if it has something in it) provides a global location to add new links. I've already released several plugin updates that take advantage of this.

The new "table actions" menu imitates Datasette's existing column header menu icon - it's a cog. Clicking it opens a menu of actions relating to the current table.

Want to see a demo?

The demo at latest.datasette.io now includes some example plugins. To see the new table actions menu first sign into that demo as root and then visit the facetable table to see the new cog icon menu at the top of the page.

Here's an animated GIF demo showing the new menus in action.

Binary data

SQLite tables can contain binary data in BLOB columns. Datasette now provides links for users to download this data directly from Datasette, and uses those links to make binary data available from CSV exports. See Binary data for more details. (#1036 and #1034).

I spent a ton of time on this over the past few weeks. The initial impetus was a realization that Datasette CSV exports included ugly Python b'\x15\x1c\x02\xc7\xad\x05\xfe' strings, which felt like the worst possible way to display binary in a CSV file, out of universally bad options.

Datasette's main interface punted on binary entirely - it would show a <Binary data: 7 bytes> label which didn't help much either.

The only way to get at binary data stored in a Datasette instance was to request the JSON version and then manually decode the Base-64 value within it!

This is now fixed: binary columns can be downloaded directly to your computer, using a new .blob output renderer. The approach is described on this new page in the documentation.

Security was a major consideration when building this feature. Allowing the download of arbitrary byte payloads from a web server is dangerous business: it can easily result in XSS holes where HTML with dangerous <script> content can end up hosted on the primary domain.

After some research, I decided to serve up binary content for download using the following headings:

content-type: application/binary
x-content-type-options: nosniff
content-disposition: attachment; filename="data-f30889.blob"

application/binary is a safer Content-Type option than the more common application/octet-stream, according to Michal Zalewski's renowned web application security book The Tangled Web (quoted here)

x-content-type-options: nosniff disables the XSS-tastic content sniffing feature in older versions of Internet Explorer, where IE would helpfully guess that you intended to serve HTML based on the first few bytes of the response.

The content-disposition: attachment header causes the browser to show a "download this file" dialog, using the suggested filename.

If you know of a reason that this isn't secure enough, please let me know!

URL building

The new datasette.urls family of methods can be used to generate URLs to key pages within the Datasette interface, both within custom templates and Datasette plugins. See Building URLs within plugins for more details. (#904)

Datasette's base_url configuration setting was the forcing factor around this piece of work.

It allows you to configure Datasette to serve content starting at a path other than / - for example:

datasette --config base_url:/path-to-datasette/

This will serve all Datasette pages at locations starting with /path-to-datasette/.

Why would you want to do this? It's useful if you are proxying traffic to Datasette from within the URL hierarchy of an existing website.

The feature didn't work properly, and enough people care about it that I had a steady stream of bug reports. For 0.51 I gathered them all into a single giant tracking issue and worked through them all one by one.

It quickly became apparent that the key challenge was building URLs within Datasette - not just within HTML template pages, but also for things like HTTP redirects.

Datasette itself needed to generate URLs that took the base_url setting into account, but so do Datasette plugins. So I built a new datasette.urls collection of helper methods and made them part of the documented internals API for plugins. The Building URLs within plugins documentation shows how these should be used.

I also added documentation on Running Datasette behind a proxy with example configs (tested on my laptop) for both nginx and Apache.

The datasette.client mechanism from Datasette 0.50 allows plugins to make calls to Datasette's internal JSON API without the overhead of an HTTP request. This is another place where plugins need to be able to construct valid URLs to internal Datasette pages.

I added this example to the documentation showing how the two features can work together:

table_json = (
    await datasette.client.get(
        datasette.urls.table("fixtures", "facetable", format="json")
    )
).json()

One final weird detail on this: Datasette now has various methods that automatically add the base_url prefix to a URL. I got worried about what would happen if these were applied more than once (as above, where datasette.urls.table() applies the prefix so does datasette.client.get()).

I fixed this using the same trick that Django and Jinja use to avoid appliying auto-escaping twice to content that will be displayed in HTML: the datasette.urls methods actually return a PrefixedUrlString object which is a subclass of str that knows that the prefix has been applied! Code for that lives here.

Smaller changes

A few highlights from the "smaller changes" in Datasette 0.51:

Wide tables shown within Datasette now scroll horizontally (#998). This is achieved using a new <div class="table-wrapper"> element which may impact the implementation of some plugins (for example this change to datasette-cluster-map).

I think this is a big improvement: if your database table is too wide, it now scrolls horizontally on the page (rather than blowing the entire page out to a wider width). You can see that in action on the global-power-plants demo.

New debug-menu permission. (#1068)

If you are signed in as root the new navigation menu links to a whole plethora of previously-undiscoverable Datasette debugging tools. This new permission controls the display of those items.

Link: HTTP header pagination. (#1014)

Inspired by GitHub and WordPress, which both use the HTTP Link header in this way. It's an optional extra though: Datasette will always offer in-JSON pagination information.

Edit SQL button on canned queries, (#1019)

Suggested by Jacob Fenton in this issue. The implementation had quite a few edge cases since there are certain categories of canned query that can't be executed as custom SQL by the user. See the issue comments for details and a demo.

--load-extension=spatialite shortcut. (#1028)

Inspired by a similar feature in sqlite-utils.

datasette -o option now opens the most relevant page. (#976)

This is a fun little feature. If your Datasette only loads a single database, and that database only has a single table (common if you've just run a single CSV import) then running this will open your browser directly to that table page:

datasette data.db -o

datasette --cors option now enables access to /database.db downloads. (#1057)

This was inspired by Mike Bostock's Observable Notebook that uses the Emscripten-compiled JavaScript version of SQLite to run queries against SQLite database files.

It turned out you couldn't use that notebook against SQLite files hosted in Datasette because they weren't covered by Datasette's CORS option. Now they are!

New documentation on Designing URLs for your plugin. (#1053)

Recommendations for plugin authors, inspired by a question from David Kane on Twitter. David has been building datasette-reconcile, a Datasette plugin that offers a reconciliation API endpoint that can be used with OpenRefine. What a brilliant idea!

datasette-edit-templates (almost)

Inspired by a conversation with Jesse Vincent, I also spent some time experimenting with the idea of a plugin that can load and edit templates from the database - which would turn a personal Datasette into a really fun interface hacking environment. I nearly got this working, and even shipped a preview of a load_template() plugin hook in the Datasette 0.51a2 alpha... before crashing into a road block when I realized that it also needed to work with Jinja's {% extends %} and {% include %} template tags and loaders for those don't currenty support async functions.

In exploring this I also realized that my load_template() plugin hook wasn't actually necessary - if I'm going to solve this problem with Jinja loaders I can do so using the existing prepare_jinja2_environment(env) hook.

My not-yet-functional prototype for this is caled datasette-edit-templates. I'm pretty confident I can get it working against the old plugin hook with a little more work.

Other weeknotes

Most of my time this week was spent on Datasette 0.51 - but I did find a little bit of time for other projects.

I finished recording my talk for PyCon Argentina. It will air on November 20th.

sqlite-utils 2.23 is out, with a .m2m() bug fix from Adam Wolf and the new ability to display progress bars when importing TSV and CSV files.

Releases this week

Several of these are updates to take advantage of the new navigation plugin hooks introduced in Datasette 0.51.

datasette-configure-fts 1.1 - 2020-11-01
datasette-graphql 1.1 - 2020-11-01
datasette-edit-schema 0.4 - 2020-10-31
datasette-upload-csvs 0.6 - 2020-10-31
datasette 0.51 - 2020-10-31
datasette 0.51a2 - 2020-10-30
datasette-edit-schema 0.4a0 - 2020-10-30
datasette 0.51a1 - 2020-10-30
datasette-render-markdown 1.2 - 2020-10-28
sqlite-utils 2.23 - 2020-10-28

TIL this week

Tags: projects, security, urls, xss, datasette, weeknotes, annotated-release-notes

Microbrowsers are Everywhere

2019-12-18T08:32:19+00:00

Microbrowsers are Everywhere

Colin Bendell introduces a new-to-me term, “microbrowsers”, to describe the user-agents which hit websites to generate unfurled link previews in messenger apps. Twitter and Facebook first popularized them, but today you’re likely getting far more preview-generating traffic from chat clients such as iMessage, WhatsApp and Slack (which won’t execute script and ignore cookies, and hence won’t show up in Google Analytics). Lots of great tips here—one example: if you provide three og:image meta tags iMessage will render them as a collage.

Via Hacker News

Tags: urls, 24-ways, metadata

Removing MediaWiki from SPA: Cool URIs don't change

2017-10-08T19:54:24+00:00

Removing MediaWiki from SPA: Cool URIs don't change

Detailed write-up from Anna Shipman describing how she archived an old MediaWiki as static content using recursive wget and some cunning application of mod_rewrite.

Via Steve Marshall

Tags: urls, annashipman

Recovering missing content from the Internet Archive

2017-10-08T19:08:57+00:00

When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.

Thank goodness then for the Wayback Machine at the Internet Archive! I tried some of the missing URLs there and found they had been captured and preserved. But how to get them back?

A quick search turned up wayback-machine-downloader, an open-source Ruby script that claims to be able to Download an entire website from the Internet Archive Wayback Machine. I gem installed it and tried it out (after some cargo cult incantations to work around some weird certificate errors I was seeing)

rvm osx-ssl-certs update all
gem update --system
gem install wayback_machine_downloader

wayback_machine_downloader http://simonwillison.net/

And it worked! I left it running overnight and came back to a folder containing 18,952 HTML files, neatly arranged in a directory structure that matched my site:

$ find . | more
.
./simonwillison.net
./simonwillison.net/2002
./simonwillison.net/2002/Aug
./simonwillison.net/2002/Aug/1
./simonwillison.net/2002/Aug/1/cetis
./simonwillison.net/2002/Aug/1/cetis/index.html
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial/index.html
...

I tarred them up into an archive and backed them up to Dropbox.

Next challenge: how to restore the missing content?

I’m a recent and enthusiastic adopter of Jupyter notebooks. As a huge fan of development in a REPL I’m shocked I was so late to this particular party. So I fired up Jupyter and used it to start playing with the data.

Here’s the final version of my notebook. I ended up with a script that did the following:

Load in the full list of paths from the tar archive, and filter for just the ones matching the /YYYY/Mon/DD/slug/ format used for my blog content
Talk to my local Django development environment and load in the full list of actual content URLs represented in that database.
Calculate the difference between the two - those are the 213 items that need to be recovered.
For each of those 213 items, load the full HTML that had been saved by the Internet Archive and feed it into the BeautifulSoup HTML parsing library.
Detect if each one is an entry, a blogmark or a quotation. Scrape the key content out of each one based on the type.
Scrape the tags for each item, using this delightful one-liner: [a.text for a in soup.findAll('a', {'rel': 'tag'})]
Scrape the comments for each item separately. These were mostly spam, so I haven’t yet recovered these for publication (I need to do some aggressive spam filtering first). I have however stashed them in the database for later processing.
Write all of the scraped data out to a giant JSON file and upload it to a gist (a nice cheap way of giving it a URL).

Having executed the above script, I now have a JSON file containing the parsed content for all of the missing items found in the Wayback Machine. All I needed then was a script which could take that JSON and turn it into records in the database. I implemented that as a custom Django management command and deployed it to Heroku.

Last step: shell into a Heroku dyno (using heroku run bash) and run the following:

./manage.py import_blog_json \
    --url_to_json=https://gist.github.com/simonw/5a5bc1f58297d2c7d68dd7448a4d6614/raw/28d5d564ae3fe7165802967b0f9c4eff6091caf0/recovered-blog-content.json \
    --tag_with=recovered

The result: 213 recovered items (which I tagged with recovered so I could easily browse them). Including the most important entry on my whole site, my write-up of my wedding!

So thank you very much to the Internet Archive team, and thank you Hartator for your extremely useful wayback-machine-downloader tool.

Tags: beautifulsoup, internet-archive, urls, jupyter

What do Twitter and Gawker think of hash-bangs URLs?

2013-12-08T17:45:00+00:00

My answer to What do Twitter and Gawker think of hash-bangs URLs? on Quora

As of December 2013 (and potentially much earlier, I don't have the exact dates) both Twitter and a Gawker have moved away from hash bang URLs, so my guess is they turned out not to be a good idea.

Most browsers now support the HTML5 history API which allows the usage of proper URLs while still fetching new content using Ajax without triggering a full page request.

Tags: twitter, urls, quora

How to find the URL of a page in an iframe?

2013-08-26T15:22:00+00:00

My answer to How to find the URL of a page in an iframe? on Quora

You can't, as this would be a security and privacy violation. Imagine an evil website which loads up Google in a full page iframe and then tracks what the unsuspecting user searches for and clicks on.

The only way you could achieve this is if the pages within the iframe were deliberately collaborating with you - for example, you open up a page in an iframe of www.example.org?tracker=specialcode and example.org has code which sends information on each subsequent page visited back to your server. Even then, the tracking would stop working once the user clicked a link to a page not hosted on example.org

Tags: urls, quora

What is the most efficient way to lookup an object (e.g. a user) by only a string?

2012-05-31T17:27:00+00:00

My answer to What is the most efficient way to lookup an object (e.g. a user) by only a string? on Quora

Yes - an index on a varchar column is exactly how you would implement this.

Tags: mysql, twitter, urls, quora

Is there an API that returns metadata for a given URL?

2012-05-31T16:01:00+00:00

My answer to Is there an API that returns metadata for a given URL? on Quora

I suggest taking a look at http://embed.ly/ - it can take a huge range of URLs and turn them in to JSON metadata. Here's what it can do with a Wikipedia page: http://embed.ly/docs/explore/obj... - and here's Google Maps URL (not as useful, but still some interesting metadata extracted) http://embed.ly/docs/explore/obj...

Tags: apis, urls, web-services, quora

How did art.sy get a ".sy" url?

2012-05-31T11:00:00+00:00

My answer to How did art.sy get a ".sy" url? on Quora

Here's a generally useful tip: if you're interested in learning more about ANY top level domain, visit the Wikipedia page for it - which will be http://en.wikipedia.org/wiki/.sy in this case (just add the domain, complete with its dot prefix, directly after en.wikipedia.org/wiki/ ).

The wikipedia page will tell you which country the domain refers to, who the registrar is, what restrictions there are on those domains and plenty more besides. In this case, .sy is Syria - not exactly the best country to be associating your brand with these days!

Tags: urls, quora

Which sites have the best URL design?

2012-05-31T09:08:00+00:00

My answer to Which sites have the best URL design? on Quora

GitHub's URL design is fantastic - it's a virtually flawless mapping of Git semantics to URL space. Their basic URL structure is excellent, but they also have a bunch of neat URL hacks going on. Here are a few of my favourites:

https://github.com/django/django/compare/4553f511557052d6f18811807ae6136f81fa86a3...master - compare view between commits, branches or tags
https://github.com/django/django... - Place #L36-38 at the end of the URL to highlight those lines of code

You can read more about GitHub's approach to URL design on Kyle Neath's blog here: http://warpspire.com/posts/url-d...

Tags: design, urls, quora

When referring to our web site in publications (or Twitter or Facebook), when is it important to provide the full URL - http://www.mywebsite.com and when should you provide just the mywebsite.com?

2012-05-31T09:07:00+00:00

My answer to When referring to our web site in publications (or Twitter or Facebook), when is it important to provide the full URL - http://www.mywebsite.com and when should you provide just the mywebsite.com? on Quora

You have no control over how other publications refer to your site - if you're lucky, they might spell it correctly and check the link works before publishing (but I wouldn't bet on it). What you DO have control over is making sure you compensate for any mistakes they make.

So, it's critical that accessing both the www. and non-www. versions of your site do the right thing. You should pick one of them as the version that works, while the other should send a 301 permanent redirect to the one that you picked. You should redirect ALL URLs, so if you've chosen to publish your site without the www. anyone who visits http://www.example.com/path/blah... is automatically redirected to http://example.com/path/blah/?fo... (with a 301 permanent redirect, not a 302).

Tags: urls, quora

How did slashes become the standard path separators for URLs?

2012-02-10T14:51:00+00:00

My answer to How did slashes become the standard path separators for URLs? on Quora

I'm going to take an educated guess and say it's because of unix file system conventions. Early web servers mapped the URL to a path on disk inside the document root - this is still how most static sites work today.

Tags: internet, urls, w3c, quora

How do you find the new URL of a Tumblr that has moved?

2012-01-30T17:32:00+00:00

My answer to How do you find the new URL of a Tumblr that has moved? on Quora

One trick that might work is to look up the old tumble in the Google cache or on archive.org, then copy and paste a unique search phrase from that page and run a Google search for:

"unique phrase found on page" site:tumblr.com

To hopefully find the new location of the tumblr.

Tags: urls, quora, tumblr

Quoting Reed Underwood

2011-02-10T16:56:00+00:00

URLs are supposed to represent resources. A web app can be a resource, and there are techniques for managing state within those. Hashbangs might be one of these. But when large web properties are converting all their links to articles and other bits of text (tweets/twits/whatever) into these monstrosities, it’s not innovation. It’s a huge mistake that ought to be regretted now and will certainly be regretted in the future.

— Reed Underwood

Tags: hashbanghell, urls, recovered

Quoting Tim Bray

2011-02-10T06:00:00+00:00

Before events took this bad turn, the contract represented by a link was simple: “Here’s a string, send it off to a server and the server will figure out what it identifies and send you back a representation.” Now it’s along the lines of: “Here’s a string, save the hashbang, send the rest to the server, and rely on being able to run the code the server sends you to use the hashbang to generate the representation.” Do I need to explain why this is less robust and flexible? This is what we call “tight coupling” and I thought that anyone with a Computer Science degree ought to have been taught to avoid it.

— Tim Bray

Tags: hashbanghell, javascript, urls, recovered

Going Postel

2011-02-09T02:18:00+00:00

Going Postel

Jeremy points out that one of the many disadvantages of publishing JavaScript dependent content on the Web is that a single typo can render your entire site unusable.

Tags: ajax, gawker, hashbanghell, jeremy-keith, urls, recovered

Breaking the Web with hash-bangs

2011-02-09T02:17:00+00:00

Breaking the Web with hash-bangs

Mike Davies explains why Gawker’s new Ajax fragment-tastic redesign is a web architecture error of colossal proportions.

Tags: ajax, gawker, hashbanghell, mike-davies, urls, recovered

Is there a way of tracking shortened URLs with Twitter streaming API?

2011-01-21T09:47:00+00:00

My answer to Is there a way of tracking shortened URLs with Twitter streaming API? on Quora

Think about it like this: the whole point of the Twitter streaming API is to get you the tweets as soon after they are posted as possible. If the API were to provide access to the lengthened URLs, it would have to delay emitting a Tweet on to the stream until a resolver had gone through each shortened URL in the tweet and checked to find what it redirects to. This would mean that the speed with which the streaming API could deal out tweets would be dependent on the speed of the third party servers that serve up the redirects. I doubt Twitter would ever want to implement this.

I believe the Twitter search API may provide an index of the lengthened version of short URLs (the search on Twitter.com quietly started doing that a month or so ago).

Tags: twitter, urls, quora

Getting Started - Google URL Shortener API

2011-01-13T03:49:00+00:00

Getting Started - Google URL Shortener API

The API for the goo.gl URL shortener is really nice—no API key required, easy to create a short URL and you can retrieve detailed stats breakdowns (similar to bit.ly) as JSON for any URL.

Tags: google, urls, recovered

Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?

2011-01-01T15:07:00+00:00

My answer to Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL? on Quora

I'd say no.

Presumably you want this so you can link to a particular point on a page - but the liquid nature of the Web means that you have no way of guaranteeing that the sentence 67% down the page for you is the same as the sentence 67% down the page for someone else (who may have a different browser width or font size).

Tags: browsers, html, urls, quora

URL Design

2010-12-31T10:03:00+00:00

URL Design

Thoughtful tips on modern URL design, from GitHub designer Kyle Neath. GitHub has the best designed URLs of any application I can think of.

Tags: github, urls, recovered

Is it a good idea to allocate URLs such as quora.com/username to users?

2010-12-22T15:17:00+00:00

My answer to Is it a good idea to allocate URLs such as quora.com/username to users? on Quora

There's an interesting discussion about this issue on this question: How do sites prevent vanity URLs from colliding with future features ?

Tags: seo, urls, quora

Spacelog: space exploration stories from the original transcripts

2010-12-10T10:07:00+00:00

Spacelog: space exploration stories from the original transcripts

The product of the most recent /dev/fort outing—a beautiful, web-native interface for browsing the NASA transcripts from the Apollo 13 and Mercury 6 missions (more to come). Every key moment has a URL.

Tags: devfort, space, urls, recovered

Porting Flickr to YUI 3: Lessons in Performance (at YUIConf 2010)

2010-11-10T18:33:00+00:00

Porting Flickr to YUI 3: Lessons in Performance (at YUIConf 2010)

Some very interesting tips here. The new Flickr photo pages suffered from what I’ve been calling “Flash of Un-Behavioured Content”, where slow loading JavaScript results in poor behaviour from some UI controls. They started using “Action Queueing”, where a small JS stub ensures a loading indicator is shown for clicks on features that have not yet fully loaded. Also, it turns out some corporate firewalls (Sonicwall in particular) dislike URLs over 1600 characters, and filter out any URL with xxx in it.

Tags: flickr, javascript, urls, yui, recovered

Is there any consensus yet on link rel=shorturl vs rev=canonical?

2010-10-15T12:03:00+00:00

My answer to Is there any consensus yet on link rel=shorturl vs rev=canonical? on Quora

It's pretty clear from the answers that rev=canonical v.s. rel=canonical is way too confusing - so it's down to rel=shortlink v.s. rel=shorturl.

Tags: html, html5, urls, quora

Why doesn't Facebook use nicer URLs?

2010-10-13T12:50:00+00:00

My answer to Why doesn't Facebook use nicer URLs? on Quora

Just noticed this link: http://www.facebook.com/notes/fa... - so it looks like things are beginning to improve.

Tags: facebook, urls, quora, ux

Why don't more websites use alternative domains?

2010-10-13T11:49:00+00:00

My answer to Why don't more websites use alternative domains? on Quora

Because regular human beings don't understand them, and expect everything to be a .com. Here's an interesting post from 2007 on why Topix.net spent $1,000,000 buying the .com domain: http://www.skrenta.com/2007/03/k...

Tags: urls, quora