Recent entries
Facebook Usernames and OpenID 26 days ago
Today’s launch of Facebook Usernames provides an obvious and exciting opportunity for Facebook to become an OpenID provider. Facebook have clearly demonstrated their interest in becoming the key online identity for their users, and the new usernames feature is their acknowledgement that URL-based identities are an important component of that, no doubt driven in part by Twitter making usernames trendy again.
It’s interesting to consider Facebook’s history with regards to OpenID and single sign on in general. When I started publicly advocating for OpenID back in 2007, my primary worry was that someone would solve the SSO problem in a proprietary way, irreparably damaging the decentralised nature of the Web—just as Microsoft had attempted a few years earlier with Passport.
When Facebook Connect was announced a year ago it seemed like my worst fears had become realised. Facebook Connect’s user experience was a huge improvement over OpenID—with only one provider, the sign in UI could be reduced to a single button. Their use of a popup window for the sign in flow was inspired—various usability studies have since shown that users are much more likely to complete a SSO flow if they can see the site they are signing in to in a background window.
Thankfully, Facebook seem to understand that the industry isn’t willing to accept a single SSO provider, no matter how smooth their implementation. Mark Zuckerberg made reassuring noises about OpenID support at both FOWA 2008 and SxSW 2009, but things really stepped up earlier this year when Facebook joined the OpenID Foundation Board (accompanied by a substantial financial donation). Facebook’s board representative, Luke Shepherd, is an excellent addition and brings a refreshingly user-centric approach to OpenID. Luke was previously responsible for much of the work on Facebook Connect and has been advocating OpenID inside Facebook for a long time.
Facebook may not have committed to becoming a provider yet (at least not in public), but their decision to become a consumer first is another interesting data point. They may be trying to avoid the common criticism thrown at companies who provide but don’t consume—if they’re not willing to eat their own dog food, why should anyone else?
At any rate, their consumer implementation is fascinating. It’s live right now, even though there’s no OpenID login box anywhere to be seen on the site. Instead, Facebook take advantage of the little known checkid_immediate mode. Once you’ve associated your OpenID with your Facebook account (using the “Linked Accounts” section of the settings pane) Facebook sets a cookie remembering your OpenID provider, which persists even after you log out of Facebook. When you later visit the Facebook homepage, a checkid_immediate request is silently sent to your provider, logging you in automatically if you are already authenticated there.
While it’s great to see innovation with OpenID at such a large scale, I’m not at all convinced that they’ve got this right. The feature is virtually invisible to users (it took me a bunch of research to figure out how to use it) and not at all intuitive—if I’ve logged out of Facebook, how come visiting the home page logs me straight back in again? I guess this is why Luke is keen on exploring single sign out with OpenID. It sounds like the current OpenID consumer support is principally intended as a developer preview, and I’m looking forward to seeing how they change it based on ongoing user research.
As OpenID provider implementation is an obvious next step that can’t be that far off—I wouldn’t be surprised to hear an announcement within a month or two.
HTTP redirect codes
As an aside, I decided to check that Facebook were using the correct 3xx HTTP status code to redirect from my old profile page to my new one. I was horrified to discover that they are using a 200 code, followed by a chunk of JavaScript to implement the redirect! The situation for logged out users is better but still fundamentally flawed: if you enable your public search listing (using an option tucked away on www.facebook.com/privacy/?view=search) and curl -i your old profile URL you get a 302 Found, when the correct status code is clearly a 301 Moved Permanently.
One final note: it almost goes without saying, but one of the best things about OpenID is that you can register a real domain name that you can own, instead of just having another URL on Facebook.
djng—a Django powered microframework one month ago
djng is nearly two weeks old now, so it’s about time I wrote a bit about the project.
I presented a keynote at EuroDjangoCon in Prague earlier this month entitled Django Heresies. The talk followed the noble DjangoCon tradition (established last year with the help of Mark Ramm and Cal Henderson) of pointing a spotlight at Django’s flaws. In my case, it was a chance to apply the benefit of hindsight to some of the design decisions I helped make back at the Lawrence Journal-World in 2004.
I took a few cheap shots at things like the {% endifequal %} tag and error silencing in the template system, but the three substantial topics in my talk were class-based generic views (I’m a fan), my hatred of settings.py and my interest in turtles all the way down.
Why I hate settings.py
In the talk, I justified my dislike for settings.py by revisiting the problems behind PHP’s magic quotes feature (finally going away for good in PHP 6). Magic quotes were one of the main reasons I switched to Python from PHP.
My main problem with magic quotes was that they made it extremely difficult to write reusable PHP code. The feature was configured globally, which lead to a quandary. What if you have two libraries, one expecting magic quotes on and the other expecting it off? Your library could check get_magic_quotes_gpc() and stripslashes() from input if the setting was turned on, but this would break in the presence of the common idiom where stripslashes() is applied to all incoming $_GET and $_POST data.
Unfortunately, global settings configured using settings.py have a similar smell to them. Middleware and context processors are the best example here—a specific setting might be needed by just one installed application, but the effects are felt by everything in the system. While I haven’t yet seen two “reusable” Django apps that require conflicting settings, per-application settings are an obvious use case that settings.py fails to cover.
Global impact aside, my bigger problem with settings.py is that I almost always end up wanting to reconfigure them at run-time.
This is possible in Django today, but comes at a price:
- Only some settings can actually be changed at run-time—others (such as USE_I18N) are lazily evaluated once and irreversibly reconfigure parts of Django’s plumbing. Figuring out which ones can be changed requires exploration of Django’s source code.
- If you change a setting, you need to reliably change it back at the end of a request or your application will behave strangely. Uncaught exceptions could cause problems here, unless you remember to wrap dynamic setting changes in a try/finally block.
- Changing a setting isn’t thread-safe (without doing some extra work).
Almost every setting in Django has legitimate use-cases for modification at run-time. Here are just a few examples:
- Requests from mobile phones may need a different TEMPLATE_DIRS setting, to load the mobile-specific templates in preference to the site defaults.
- Some sites offer premium accounts which in turn gain access to more reliable servers. Premium users might get to send e-mail via a separate pool of SMTP servers, for example.
- Some sections of code may want to use a different cache backend, or talk to a different set of memcache servers—to reduce the chance of one rapidly changing component causing other component’s cache entries to expire too early.
- Errors in one area of a site might need to be sent to a different team of developers.
- Admin users might want DEBUG=True, while regular site visitors get DEBUG=False.
Finally, settings.py is behind the dreaded “Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined” exception. Yuck.
Turtles all the way down
The final section of the talk was about turtles. More precisely, it was about their role as an “infinite regression belief about cosmology and the nature of the universe”. I want to apply that idea to Django.
My favourite thing about Django is something I’ve started to call the “Django Contract”: the idea that a Django view is a callable which takes a request object and returns a response object. I want to expand that concept to other parts of Django as well:
- URLconf: takes a request, dispatches based on
request.path, returns a response. - Application: takes a request, returns a response
- Middleware: takes a request, returns a response (conditionally transforming either)
- Django-powered site: hooked in to mod_wsgi/FastCGI/a Python web server, takes a request, returns a response
So instead of a Django site consisting of a settings.py, urls.py and various applications and middlewares, a site would just be a callable that obeys the Django Contract and composes together dozens of other callables.
At this point, Django starts to look a lot like WSGI. What if WSGI and the Django Contract were interchangeable? WSGI is a wrapper around HTTP, so what if that could be swapped in and out (through proxies) as well? Django, WSGI and HTTP, three breeds of turtle arranged on top of each other in various configurations. Turtles all the way down.
djng
djng is my experiment to see what Django would like without settings.py and with a whole lot more turtles. It’s Yet Another Python Microframework.
What’s a microframework? The best examples are probably web.py (itself a result of Aaron Swartz’s frustrations with Django) and Sinatra, my all time favourite example of Ruby DSL design. More recent examples in Python include juno, newf, mnml and itty.
Microframeworks let you build an entire web application in a single file, usually with only one import statement. They are becoming increasingly popular for building small, self-contained applications that perform only one task—Service Oriented Architecture reborn as a combination of the Unix development philosophy and RESTful API design. I first saw this idea expressed in code by Anders Pearson and Ian Bicking back in 2005.
Unlike most microframeworks, djng has a pretty big dependency: Django itself. The plan is to reuse everything I like about Django (the templates, the ORM, view functions, the form library etc) while replacing just the top level plumbing and removing the requirement for separate settings.py and urls.py files.
This is what “Hello, world” looks like in in djng:
import djng
def index(request):
return djng.Response('Hello, world')
if __name__ == '__main__':
djng.serve(index, '0.0.0.0', 8888)
djng.Response is an alias for Django’s HttpResponse. djng.serve is a utility function which converts up anything fulfilling the Django Contract in to a WSGI application, then exposes it over HTTP.
Let’s add URL routing to the example:
app = djng.Router(
(r'^hello$', lambda request: djng.Response('Hello, world')),
(r'^goodbye$', lambda request: djng.Response('Goodbye, world')),
)
if __name__ == '__main__':
djng.serve(app, '0.0.0.0', 8888)
The implementation of djng.Router is just a few lines of glue code adding a nicer API to Django’s internal RegexURLResolver class.
Services, not settings
The trickiest problem I still need to solve is how to replace settings.py. A group of developers (including Adrian, Armin, Alex and myself) had an excellent brainstorming session at EuroDjangoCon about this. We realised that most of the stuff in settings.py can be recast as configuring services which Django makes available to the applications it is hosting. Services like the following:
- Caching
- Templating
- Sending e-mail
- Sessions
- Database connection—
django.db.connection - Higher level ORM
- File storage
Each of the above needs to be configured, and each also might need to be reconfigured at runtime. Django already points in this direction by providing hooks for adding custom backends for caching, template loading, file storage and session support. What’s missing is an official way of swapping in different backends at runtime.
I’m currently leaning towards the idea of a “stack” of service implementations, one for each of the service categories listed above. A new implementation could be pushed on to the stack at any time during the Django request/response cycle, and will be automatically popped back off again before the next request is processed (all in a thread-safe manner). Applications would also be able to instantiate and use a particular service implementation directly should they need to do so.
A few days ago I heard about Contextual, which appears to be trying to solve a similar problem. Just a few minutes ago I stumbled across paste.registry’s StackedObjectProxy which seems to be exactly what I’ve been busily reinventing.
My current rough thoughts on an API for this can be found in services_api_ideas.txt. I’m eager to hear suggestions on how to tackle this problem.
djng is very much an experiment at the moment—I wouldn’t suggest building anything against it unless you’re willing to maintain your own fork. That said, the code is all on GitHub partly because I want people to fork it and experiment with their own API concepts as much as possible.
If you’re interested in exploring these concepts with me, please join me on the brand new djng mailing list.
rev=canonical bookmarklet and designing shorter URLs two months ago
I’ve watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it’s very unlikely that all of these new services will still be around in twenty years time. Last month I suggested that the Internet Archive start mirroring redirect databases, and last week I was pleased to hear that Archiveteam, a different organisation, had already started crawling.
The most recent discussion was kicked off by Joshua Schachter and Dave Winer, and a solution has emerged driven by some lightning fast hacking by Kellan Elliott-McCrea. The idea is simple: sites get to chose their preferred source of shortened URLs (including self-hosted solutions) and specify it from individual pages using <link rev="canonical" href="... shorter URL here ...">.
By hosting their own shorteners, the reliability should match that of the host site—and the amount of damage caused by a major shortener going missing can be dramatically reduced.
I’ve been experimenting with this new pattern today. Here are a few small contributions to the wider discussion.
A URL shortening bookmarklet
Kellan’s rev=canonical service exposes rev=canonical links using a server-side script running on App Engine. An obvious next step is to distil that logic in to a bookmarklet. I decided to combine the rev=canonical logic with my json-tinyurl web service (also on App Engine), which allows browsers to lookup or create TinyURLs using a cross-domain JSONP request. The resulting bookmarklet will display the site’s rev=canonical link if it exists, or create and display a TinyURL link otherwise:
Bookmarklet: Shorten (drag to your browser toolbar)
You can also grab the uncompressed source code.
Designing short URLs
I’ve also implemented rev=canonical on this site. I ended up buying a new domain for this, since simonwillison.net is both difficult to spell and 17 characters long. I ended up going with swtiny.eu—9 characters, and keeping tiny in the domain helps people guess the nature of the site from just the URLs it generates. Be warned: the DNS doesn’t appear to have finished resolving yet.
For the path component, I turned to a variant of base 62 encoding. Decimal integers are represented using 10 digits (0-9), but base 62 uses those digits plus the letters of the alphabet in both lower and upper case. A 13 character integer such as 7250397214971 compresses down to just 8 characters (CDeIPpOD) using base62. My baseconv.py module implements base62, among others. I considered using base 57 by excluding o, O, 0, 1 and l as being too easily confused but decided against it.
This site has three key types of content: entries, blogmarks and quotations. Each one is a separate Django model, and hence each has its own underlying database table and individual ID sequence. Since the IDs overlap, I need a way of separating out the shortened URLs for each content type.
I decided to spend a byte on namespacing my shortened URLs. A prefix of E means an entry, Q means a quotation and B means a blogmark. For example:
- http://swtiny.eu/EZ8: Entry with ID 1584
- http://swtiny.eu/BBEQ: Blogmark with ID 4108
- http://swtiny.eu/QE5: Quotation with ID 279
By using upper case letters for the prefixes, I can later define custom paths starting with a lower case letter. I also have another 23 upper case prefix letters reserved in case I need them.
I asked on Twitter and consensus opinion was that a 301 permanent redirect was the right thing to do (as opposed to a 302), both for SEO reasons and because the content will never exist at the shorter URL.
Implementation using Django and nginx
I run all of my Django sites using Apache and mod_wsgi, proxied behind nginx. Each site gets an Apache running on a high port, and nginx deals with virtual host configuration (proxying each domain to a different Apache backend) and static file serving. I didn’t want to set up a full Django site just to run swtiny.eu, especially since my existing blog engine was required in order to resolve the shortened URLs.
Instead, I implemented the shortened URL direction as just another view within my existing site: http://simonwillison.net/shorter/EZ8. I then configured nginx to invisibly requests to swtiny.eu through to that URL. The correct incantation took a while to figure out, so here’s the relevant section of my nginx.conf:
server {
listen 80;
server_name www.swtiny.eu swtiny.eu;
location / {
rewrite (.*) /shorter$1 break;
proxy_pass http://simonwillison.net;
proxy_redirect off;
}
}
proxy_redirect off is needed to prevent nginx from replacing simonwillison.net in the resulting location header with swtiny.eu. My Django view code is relatively shonky, but if you’re interested you can find it here.
The nice thing about this approach is that it makes it trivial to add custom URL shortening domains to other projects—a quick view function and a few lines of nginx configuration are all that is needed.
Update: The bookmarklet now supports the rev attribute on A elements as well—thanks for the suggestion, Jeremy.
