Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Django ponies: Proposals for Django 1.2

I’ve decided to step up my involvement in Django development in the run-up to Django 1.2, so I’m currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I’m also ensuring I have the code to back them up—my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don’t think he’d appreciate a repeat performance.

I’m not a big fan of branches when it comes to exploratory development—they’re fine for doing the final implementation once an approach has been agreed, but I don’t think they are a very effective way of discussing proposals. I’d much rather see working code in a separate application—that way I can try it out with an existing project without needing to switch to a new Django branch. Keeping code out of a branch also means people can start using it for real development work, making the API much easier to evaluate. Most of my proposals here have accompanying applications on GitHub.

I’ve recently got in to the habit of including an “examples” directory with each of my experimental applications. This is a full Django project (with settings.py, urls.py and manage.py files) which serves two purposes. Firstly, it allows developers to run the application’s unit tests without needing to install it in to their own pre-configured project, simply by changing in to the examples directory and running ./manage.py test. Secondly, it gives me somewhere to put demonstration code that can be viewed in a browser using the runserver command—a further way of making the code easier to evaluate. django-safeform is a good example of this pattern.

Here’s my current list of ponies, in rough order of priority.

Signing and signed cookies

Signing strings to ensure they have not yet been tampered with is a crucial technique in web application security. As with all cryptography, it’s also surprisingly difficult to do correctly. A vulnerability in the signing implementation used to protect the Flickr API was revealed just today.

One of the many uses of signed strings is to implement signed cookies. Signed cookies are fantastically powerful—they allow you to send cookies safe in the knowledge that your user will not be able to alter them without you knowing. This dramatically reduces the need for sessions—most web apps use sessions for security rather than for storing large amounts of data, so moving that “logged in user ID” value to a signed cookie eliminates the need for session storage entirely, saving a round-trip to persistent storage on every request.

This has particularly useful implications for scaling—you can push your shared secret out to all of your front end web servers and scale horizontally, with no need for shared session storage just to handle simple authentication and “You are logged in as X” messages.

The latest version of my django-openid library uses signed cookies to store the OpenID you log in with, removing the need to configure Django’s session storage. I’ve extracted that code in to django-signed, which I hope to evolve in to something suitable for inclusion in django.utils.

Please note that django-signed has not yet been vetted by cryptography specialists, something I plan to fix before proposing it for final inclusion in core.

Improved CSRF support

This is mainly Luke Plant’s pony, but I’m very keen to see it happen. Django has shipped with CSRF protection for more than three years now, but the approach (using middleware to rewrite form HTML) is relatively crude and, crucially, the protection isn’t turned on by default. Hint: if you aren’t 100% positive you are protected against CSRF, you should probably go and turn it on.

Luke’s approach is an iterative improvement—a template tag (with a dependency on RequestContext) is used to output the hidden CSRF field, with middleware used to set the cookie and perform the extra validation. I experimented at length with an alternative solution based around extending Django’s form framework to treat CSRF as just another aspect of validation—you can see the result in my django-safeform project. My approach avoids middleware and template tags in favour of a view decorator to set the cookie and a class decorator to add a CSRF check to the form itself.

While my approach works, the effort involved in upgrading existing code to it is substantial, compared to a much easier upgrade path for Luke’s middleware + template tag approach. The biggest advantage of safeform is that it allows CSRF failure messages to be shown inline on the form, without losing the user’s submission—the middleware check means showing errors as a full page without redisplaying the form. It looks like it should be possible to bring that aspect of safeform back to the middleware approach, and I plan to put together a patch for that over the next few days.

Better support for outputting HTML

This is a major pet peeve of mine. Django’s form framework is excellent—one of the best features of the framework. There’s just one thing that bugs me about it—it outputs full form widgets (for input, select and the like) so that it can include the previous value when redisplaying a form during validation, but it does so using XHTML syntax.

I have a strong preference for an HTML 4.01 strict doctype, and all those <self-closing-tags /> have been niggling away at me for literally years. Django bills itself as a framework for “perfectionists with deadlines”, so I feel justified in getting wound up out of proportion over this one.

A year ago I started experimenting with a solution, and came up with django-html. It introduces two new Django template tags—{% doctype %} and {% field %}. The doctype tag serves two purposes—it outputs a particular doctype (saving you from having to remember the syntax) and it records that doctype in Django’s template context object. The field tag is then used to output form fields, but crucially it gets to take the current doctype in to account.

The field tag can also be used to add extra HTML attributes to form widgets from within the template itself, solving another small frustration about the existing form library. The README describes the new tags in detail.

The way the tags work is currently a bit of a hack—if merged in to Django core they could be more cleanly implemented by refactoring the form library slightly. This refactoring is currently being discussed on the mailing list.

Logging

This is the only proposal for which I don’t yet have any code. I want to add official support for Python’s standard logging framework to Django. It’s possible to use this at the moment (I’ve done so on several projects) but it’s not at all clear what the best way of doing so is, and Django doesn’t use it internally at all. I posted a full argument in favour of logging to the mailing list, but my favourite argument is this one:

Built-in support for logging reflects a growing reality of modern Web development: more and more sites have interfaces with external web service APIs, meaning there are plenty of things that could go wrong that are outside the control of the developer. Failing gracefully and logging what happened is the best way to deal with 3rd party problems—much better than throwing a 500 and leaving no record of what went wrong.

I’m not actively pursuing this one yet, but I’m very interesting in hearing people’s opinions on the best way to configure and use the Python logging module in production.

A replacement for get_absolute_url()

Django has a loose convention of encouraging people to add a get_absolute_url method to their models that returns that object’s URL. It’s a controversial feature—for one thing, it’s a bit of a layering violation since URL logic is meant to live in the urls.py file. It’s incredibly convenient though, and since it’s good web citizenship for everything to have one and only one URL I think there’s a pretty good argument for keeping it.

The problem is, the name sucks. I first took a look at this in the last few weeks before the release of Django 1.0—what started as a quick proposal to come up with a better name before we were stuck with it quickly descended in to a quagmire as I realised quite how broken get_absolute_url() is. The short version: in some cases it means “get a relative URL starting with /”, in other cases it means “get a full URL starting with http://” and the name doesn’t accurately describe either.

A full write-up of my investigation is available on the Wiki. My proposed solution was to replace it with two complementary methods—get_url() and get_url_path()—with the user implementing one hence allowing the other one to be automatically derived. My django-urls project illustrates the concept via a model mixin class. A year on I still think it’s quite a neat idea, though as far as I can tell no one has ever actually used it.

Comments on this post are open, but if you have anything to say about any of the individual proposals it would be much more useful if you posted it to the relevant mailing list thread.

This is Django ponies: Proposals for Django 1.2 by Simon Willison, posted on 28th September 2009.

Tagged , , , , , , , , , , , , , ,

View blog reactions

Next: This shouldn't be the image of Hack Day

Previous: Hack Day tools for non-developers

18 comments

  1. +1 for logging integration. I've been particularly interested in logging integration with mongodb, although I haven't done much about it yet.

    Signed cookies are also cool, although you run the risk of making cookies large and adding to request latency.

    Parand - 29th September 2009 00:40 - #

  2. Cookie size is definitely a concern. I've pulled together a bunch of tricks to attempt to address it though. Firstly, the signature (hmac/sha1) is encoded as a base64 string, knocking it down from 40 characters of hex to 27 characters. Secondly, if the thing being signed is a pickle it can optionally be compressed first, and the library is smart enough to check that it will really save space if it does so. Finally, I hope to ship baseconv.py with the signing code which makes it easy to represent integers (such as signature timestamps) as base62 strings.

    At the end of the day it will be up to the user to decide how much information they want to cram in a cookie, but if they're just doing username, OpenID or numeric user ID it shouldn't add too much overhead to the requests.

    Simon Willison - 29th September 2009 00:48 - #

  3. Stupid doctype tangent: I'm not sure I get reluctance for people to grok or assimilate XHTML. The promise of XHTML (1.0 or XHTML 5) was/is that one need not have to mentally context switch between SGML and XML. So unless you get all Pol-Pot-purging on use of XML altogether, there are still fewer ways of instinctively writing markup to keep in your head and fingers.

    That said, I recognize the power of personal preferences and the need for frameworks to support their users expressions of such.

    Sean Upton - 29th September 2009 00:53 - #

  4. I'd really like to see the test framework go away and be replaced with support for Nose as so many projects have done. We recently did this with SQLAlchemy and it has been a big win.

    I like your other suggestions and particularly the django-urls stuff.

    Empty - 29th September 2009 00:58 - #

  5. Actually, I have been using a modified UrlMixin for about a year. get_url() is usually necessary for links in feeds and emails whereas get_url_path() is better for links in a website.

    Aidas Bendoraitis - 29th September 2009 01:52 - #

  6. I considered the signed cookies definitely the best of all.

    online dating - 29th September 2009 07:20 - #

  7. This is a great writeup! I admittedly don't have enough Django experience to totally understand all of this (I learned a few things while reading) but your descriptions pretty clearly state what the current problems are and good potential solutions. I'd love to see more discussion and for some of these proposals to make it into 1.2.

    John Debs - 29th September 2009 07:20 - #

  8. This is somewhat stupid one.There is something error in logging with this kind of code.

    matrimony - 29th September 2009 07:22 - #

  9. The key to good signed cookies is to have key migration built in. You need to be able to deploy a new key to a set of servers. It should migrate any old keys to the new one (falling back to old keys for verification until the old keys hit their timeout).

    I've done this in production on non-Django systems. Probably won't have time in the next month or so, but if no one has done this, I'll take a look at what it would take to become a Django contributor. Key migration is nec for production-quality systems, IMHO.

    Thanks for all the amazing work. Django is definitely changing how I develop.

    Dan G - 29th September 2009 08:22 - #

  10. Dan G - I'm really glad you said that. Key migration (I didn't know that was the official term for it) has been on my mind - there's a section at the bottom of the wiki page with some thoughts - but I don't know anything about best practices in this area. Now I know there's definite demand for it. If you want to contribute, just branch my django-signing code on GitHub and start hacking - though we might ask you to sign a CLA later on.

    Simon Willison - 29th September 2009 09:19 - #

  11. Hi Simon,

    I abhor get_absolute_url() but I think the solution it tries to provide is tremendously useful.

    How difficult would it be to have models use a get_view_name() instead and have URL resolution look up the view in question in urls.py? That would provide the same usefulness in a hugely DRY-er way, wouldn't it?

    S. - 29th September 2009 16:18 - #

  12. Hi Simon,

    Look like there's quite a lot of Django code vulnerable to length-extension attack. There's also some misuses of other crypto primitives. I'd like to help to fix them. What should I do now? I can give some patches if you want.

    Please advise.

    Thai.

    Thai - 29th September 2009 17:33 - #

  13. Tha: that's exactly why I want to get cryptographers involved for the signing stuff - I'll be in touch directly. Thanks.

    Simon Willison - 29th September 2009 21:43 - #

  14. Hi Simon,

    I thought you knew but there is actually an attempt to use built-in python logging module within Django (at least on UI, not in the core).
    You can have a look here : http://code.google.com/p/django-logging/
    But this is maybe not as complete as it could be if integrated in the code of Django.

    Vincent.

    Vincent - 30th September 2009 07:45 - #

  15. I second having some kinda of logging integration by default. For people new to django and python that is the easiest way to learn how the framework and language works.

    Vitaly Babiy - 30th September 2009 15:04 - #

  16. I think get_absolute_url and other similar models.py based solutions considered harmful.
    i'm using named urls and the "url" template tag like this way:

    {% url user-profile user.id %}
    or
    {% url user-profile user.username %}

    and if i change user profile url, simple search-n-replace enough.

    Hilmi - 8th October 2009 00:43 - #

  17. Tiny correction to the documentation of {% slash %} on django-html, you've got the slash and the space the wrong way around. It should read:

    The `{% slash %}` tag simply outputs the string " /" for an XHTML doctype and renders blank for an HTML doctype.

    (The current documentation makes it look like it would render <br/ > instead of <br />, which isn't accurate.)

    Dominic Rodger - 9th October 2009 16:07 - #

  18. cool! google django!

    dong - 16th October 2009 03:50 - #

Comments are closed.
A django site