Simon Willison’s Weblog


Porting my blog to Python 3

21st October 2017

This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.

Library authors like to use six to allow them to write code that supports both Python 2 and Python 3 at the same time… but my blog isn’t a library, so I used the 2to3 conversion tool that ships with Python instead.

And… it worked pretty well! I ran the following command from my project’s root directory:

2to3 -w -n blog/ config/ redirects/ feedstats/

The -w option causes the files to be over-written in place. Since everything is already in git, there was no reason to have 2to3 show my a diff without applying it. Likewise, the -n option tells 2to3 not to bother saving backups of the files it modifies.

Here’s the initial commit containing mostly the 2to3 changes.

Next step: run the tests! My test suite may be very thin, but it does at least check that the app can run its migrations, start up and serve a few basic pages without errors. One of my migrations was failing due to rogue bytestrings but that was an easy fix.

At this point I started to lean heavily on my continuous integration setup built on Travis CI. All of my Python 3 work took place in a branch, and all it took was a one line change to my .travis.yml for Travis to start running the tests for that branch using Python 3.

With the basic tests working, I made my first deploy to my Heroku staging instance—after first modifying my Heroku runtime.txt to tell it to use Python 3.6.2. My staging environment allowed me to sanity check that everything would work OK when deployed to Heroku.

At this point I got a bit lazy. The responsible thing to do would have been extensive manual testing plus systematic unit test coverage of core functionality. My blog is hardly a critical piece of infrastructure though, so I went with the faster option: put it all live and use Sentry to see if anything breaks.

This is where Heroku’s ability to deploy a specific branch came in handy: one click to deploy my python3 branch, keep an eye on Sentry (via push notifications from my private slack channel) and then one click to deploy my master branch again for an instant rollback in case of errors. Which I had to do instantly, because it turned out I had stored some data in Django’s cache using Python 2 pickle and was trying to read it back out again using Python 3.

I fixed that by bumping my cache VERSION setting and deployed again. This deploy lasted a few minute longer before Sentry started to fill up with encoding errors and I rolled it back again.

The single biggest difference between Python 2 and Python 3 is how strings are handled. Python 3 strings are unicode sequences. Learning to live in a world where strings are all unicode and byte strings are the rare, deliberate exceptions takes some getting used to.

The key challenge for my blog actually came from my custom markup handling template tags. 15 years ago I made the decision to store all of my blog entries as valid XHTML fragments. This meant I could use XML processors—back then in PHP, today Python’s ElementTree—to perform various transformations on my content.

ElementTree in Python 2 can only consume bytestrings. In Python 3 it expects unicode strings. Cleaning this up took a while, eventually inspiring me to refactor my custom template tags completely. In the process I realized that my blog templates were mostly written back before Django’s template language implemented autoescape (in Django 1.0), so my code was littered with unnecessary |escape and |safe filters. Those are all gone now.

Sentry lets you mark an exception as “resolved” when you think you’ve fixed it—if it occurs again after that it will be re-reported to your Slack channel and added back to the Sentry list of unresolved issues. Once Sentry was clear (especially given Googlebot had crawled my older pages) I could be pretty confident there were no more critical 500-causing errors.

That left logic errors, of which only one has cropped up so far: the “zero years ago” bug. Entries on my homepage include a relative date representation, e.g. “three days ago”. Python 3 changed how the divison operator works on integers3 / 2 == 1.5 where in Python 2 it gets truncated to 1. As a result, every entry on my homepage showed “zero years ago”. Thankfully this was a one-line fix.

All in all this process was much less painful than I expected. It took me longer to write this blog entry than it did to actually make the conversion (thanks to 2to3 doing most of the tedious work), and the combination of Travis CI, Sentry and Heroku allowed me to ship aggressively with the knowledge that I could promptly identify and resolve any issues that came up.

Next upgrade: Django 2.0!