Simon Willison’s Weblog

Subscribe

Django ponies: Proposals for Django 1.2

28th September 2009

I’ve decided to step up my involvement in Django development in the run-up to Django 1.2, so I’m currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I’m also ensuring I have the code to back them up—my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don’t think he’d appreciate a repeat performance.

I’m not a big fan of branches when it comes to exploratory development—they’re fine for doing the final implementation once an approach has been agreed, but I don’t think they are a very effective way of discussing proposals. I’d much rather see working code in a separate application—that way I can try it out with an existing project without needing to switch to a new Django branch. Keeping code out of a branch also means people can start using it for real development work, making the API much easier to evaluate. Most of my proposals here have accompanying applications on GitHub.

I’ve recently got in to the habit of including an “examples” directory with each of my experimental applications. This is a full Django project (with settings.py, urls.py and manage.py files) which serves two purposes. Firstly, it allows developers to run the application’s unit tests without needing to install it in to their own pre-configured project, simply by changing in to the examples directory and running ./manage.py test. Secondly, it gives me somewhere to put demonstration code that can be viewed in a browser using the runserver command—a further way of making the code easier to evaluate. django-safeform is a good example of this pattern.

Here’s my current list of ponies, in rough order of priority.

Signing and signed cookies

Signing strings to ensure they have not yet been tampered with is a crucial technique in web application security. As with all cryptography, it’s also surprisingly difficult to do correctly. A vulnerability in the signing implementation used to protect the Flickr API was revealed just today.

One of the many uses of signed strings is to implement signed cookies. Signed cookies are fantastically powerful—they allow you to send cookies safe in the knowledge that your user will not be able to alter them without you knowing. This dramatically reduces the need for sessions—most web apps use sessions for security rather than for storing large amounts of data, so moving that “logged in user ID” value to a signed cookie eliminates the need for session storage entirely, saving a round-trip to persistent storage on every request.

This has particularly useful implications for scaling—you can push your shared secret out to all of your front end web servers and scale horizontally, with no need for shared session storage just to handle simple authentication and “You are logged in as X” messages.

The latest version of my django-openid library uses signed cookies to store the OpenID you log in with, removing the need to configure Django’s session storage. I’ve extracted that code in to django-signed, which I hope to evolve in to something suitable for inclusion in django.utils.

Please note that django-signed has not yet been vetted by cryptography specialists, something I plan to fix before proposing it for final inclusion in core.

Improved CSRF support

This is mainly Luke Plant’s pony, but I’m very keen to see it happen. Django has shipped with CSRF protection for more than three years now, but the approach (using middleware to rewrite form HTML) is relatively crude and, crucially, the protection isn’t turned on by default. Hint: if you aren’t 100% positive you are protected against CSRF, you should probably go and turn it on.

Luke’s approach is an iterative improvement—a template tag (with a dependency on RequestContext) is used to output the hidden CSRF field, with middleware used to set the cookie and perform the extra validation. I experimented at length with an alternative solution based around extending Django’s form framework to treat CSRF as just another aspect of validation—you can see the result in my django-safeform project. My approach avoids middleware and template tags in favour of a view decorator to set the cookie and a class decorator to add a CSRF check to the form itself.

While my approach works, the effort involved in upgrading existing code to it is substantial, compared to a much easier upgrade path for Luke’s middleware + template tag approach. The biggest advantage of safeform is that it allows CSRF failure messages to be shown inline on the form, without losing the user’s submission—the middleware check means showing errors as a full page without redisplaying the form. It looks like it should be possible to bring that aspect of safeform back to the middleware approach, and I plan to put together a patch for that over the next few days.

Better support for outputting HTML

This is a major pet peeve of mine. Django’s form framework is excellent—one of the best features of the framework. There’s just one thing that bugs me about it—it outputs full form widgets (for input, select and the like) so that it can include the previous value when redisplaying a form during validation, but it does so using XHTML syntax.

I have a strong preference for an HTML 4.01 strict doctype, and all those <self-closing-tags /> have been niggling away at me for literally years. Django bills itself as a framework for “perfectionists with deadlines”, so I feel justified in getting wound up out of proportion over this one.

A year ago I started experimenting with a solution, and came up with django-html. It introduces two new Django template tags—{% doctype %} and {% field %}. The doctype tag serves two purposes—it outputs a particular doctype (saving you from having to remember the syntax) and it records that doctype in Django’s template context object. The field tag is then used to output form fields, but crucially it gets to take the current doctype in to account.

The field tag can also be used to add extra HTML attributes to form widgets from within the template itself, solving another small frustration about the existing form library. The README describes the new tags in detail.

The way the tags work is currently a bit of a hack—if merged in to Django core they could be more cleanly implemented by refactoring the form library slightly. This refactoring is currently being discussed on the mailing list.

Logging

This is the only proposal for which I don’t yet have any code. I want to add official support for Python’s standard logging framework to Django. It’s possible to use this at the moment (I’ve done so on several projects) but it’s not at all clear what the best way of doing so is, and Django doesn’t use it internally at all. I posted a full argument in favour of logging to the mailing list, but my favourite argument is this one:

Built-in support for logging reflects a growing reality of modern Web development: more and more sites have interfaces with external web service APIs, meaning there are plenty of things that could go wrong that are outside the control of the developer. Failing gracefully and logging what happened is the best way to deal with 3rd party problems—much better than throwing a 500 and leaving no record of what went wrong.

I’m not actively pursuing this one yet, but I’m very interesting in hearing people’s opinions on the best way to configure and use the Python logging module in production.

A replacement for get_absolute_url()

Django has a loose convention of encouraging people to add a get_absolute_url method to their models that returns that object’s URL. It’s a controversial feature—for one thing, it’s a bit of a layering violation since URL logic is meant to live in the urls.py file. It’s incredibly convenient though, and since it’s good web citizenship for everything to have one and only one URL I think there’s a pretty good argument for keeping it.

The problem is, the name sucks. I first took a look at this in the last few weeks before the release of Django 1.0—what started as a quick proposal to come up with a better name before we were stuck with it quickly descended in to a quagmire as I realised quite how broken get_absolute_url() is. The short version: in some cases it means “get a relative URL starting with /”, in other cases it means “get a full URL starting with http://” and the name doesn’t accurately describe either.

A full write-up of my investigation is available on the Wiki. My proposed solution was to replace it with two complementary methods—get_url() and get_url_path()—with the user implementing one hence allowing the other one to be automatically derived. My django-urls project illustrates the concept via a model mixin class. A year on I still think it’s quite a neat idea, though as far as I can tell no one has ever actually used it.

Comments on this post are open, but if you have anything to say about any of the individual proposals it would be much more useful if you posted it to the relevant mailing list thread.