Announcing dmigrations
The team at Global Radio (formerly GCap Media) is the largest group of Django developers I’ve personally worked with, consisting of 14 developers split into two scrum teams, all contributing to the same overall codebase.
Working with that many developers makes smart tools and processes essential, and in some cases we’ve had to develop our own. Today, we’re releasing one of them as an open source project.
dmigrations is a Django migrations tool. It addresses a common problem in Django development: if you change a model after creating the database tables for it with syncdb, how do you reflect those changes in your database tables without blowing away your existing data and starting again from scratch?
django-evolution attempts to address this problem the clever way, by detecting changes to models that are not yet reflected in the database schema and figuring out what needs to be done to bring the two back in sync. In contrast, dmigrations takes the stupid approach: it requires you to explicitly state the changes in a sequence of migrations, which will be applied in turn to bring a database up to the most recent state that reflects the underlying models.
This means extra work for developers who create migrations, but it also makes the whole process completely transparent—for our projects, we decided to go with the simplest system that could possibly work.
The interface to dmigrations is a pair of custom Django commands. The first, ./manage.py dmigrate, provides a set of command for listing, applying and unapplying (reverting) migrations. This entirely replaces Django’s syncdb command.
The second, ./manage.py dmigration, provides commands for code-generating new migrations. It turns out that most migrations fit a set of common patterns: add a new table, add the tables for a new Django application, add a column to an existing table, add an index. These common cases are handled by dmigration; if you want to do something more complex (rename a column while transforming its data for example) you’ll need to write a custom migration class.
The dmigrations tutorial provides a full introduction to both of these commands, as well as hints on writing your own custom migrations. Since migrations are just classes, one of our hopes is that external developers will write extra migration classes for operations like “rename column”—things that currently require a one-off custom migration.
dmigrations is actually the third iteration of our in-house migrations system. The first, smigrations, was designed to do the least amount of work possible to give us a controlled way of applying changes to our database schemas. The ’s’ stood for ’simple’. The second version (dmigrations) written by Tomasz Wegrzanowski consisted of a major upgrade to smigrations that addressed many of the frustrations we found when using it with branched development, in particular the problem of migrations in two branches conflicting with each other. The ’d’ stood for ’distributed’.
Version three, released today, is my refactoring of dmigrations to de-couple it from the rest of our codebase. I’ve also stubbed out hooks for adding support for alternative database engines; dmigrations only supports MySQL out of the box, but I’m keen on getting it working with other databases now that it’s out in the wild. Patches welcome!
How does this fit in with South and django-evolution?
That’s an excellent question. We’ll be discussing all three systems on the schema evolution panel at DjangoCon this weekend. I would love to see co-operation between the projects; at the very least I’d like to see the emergence of a standard Django-style abstraction library for create/alter table statements (something we punted on entirely with dmigrations). You’ll certainly be hearing a lot more about migrations in Django after the conference.
WOW. Slick! Exactly what we need now when 1.0 is around the corner. Great work!
Andreas - 3rd September 2008 19:32 - #
Awesome! Can't wait to try it out.
Very cool stuff. I like the API a lot from what I've read so far.
Nice to finally see it make an appearance! You're right, it is somewhat like South, but more... combinatorical.
Anyway, onwards to DjangoCon! I'm excited to see what comes of all this; it can only be for the better.
Andrew Godwin - 3rd September 2008 19:47 - #
This looks like exactly what I want. Thanks!
Looking forward to trying this out for Pinax as our approach to date seems to fit better with this than django-evolution
Great This is KUNG-FU!!!!
Marco - 3rd September 2008 21:41 - #
tnx, interesting tool
as i see gcapmedia.com, thisisglobal.com php site B)
Great work. And don't call this approach stupid. I like to have control over what my code is doing ;)
slav0nic - yup, for the moment at least. So far the biggest Django projects that we've actually launched are http://www.capitalradio.co.uk/ and http://events.xfm.co.uk/gigs
Does this have a way to handle lossy data migrations safely? e.g. if we're combining partial data from existing columns into other columns and discarding the rest, you wouldn't want to discard anything until and unless the combining was completed successfully, so the up() would be a two-stage process.
Durdinator - 3rd September 2008 23:45 - #
Durdinator: sort of. If you're doing something complex like that you write a custom migration (a class with an up() method and a down() method). Within those methods you can manage your own transactions and do as much clever stuff as you like. I've written single migrations that took three hours to put together for relatively complex data changes.
Again, part of the reason we've open sourced this is that we're hoping clever developers will create Migration classes that do that kind of thing for us!
It should go without saying, but you'd be crazy to run a complex migration without doing a full backup of your data first. Our policy is to run a full backup before running ANY migrations in production, which I think is a very smart way of going about it.
It must just be me, but when I realized Django didn't have this migration I was surprised... then I just started writing the SQL statements myself when necessary. IMO, this feature isn't a big enough deal to have so many different projects working on it... maybe just one unified method.
P.S. Having many machines isn't a good excuse since it is trivial to run the same command on multiple machines simultaneously ;)
How do you handle branches?
Say release 3.0 is branched and is on database version 118.
Development on trunk continues and is on db version 122.
You find a bug in 3.0 and you need to push out a 3.01 release quickly that requires a database change. Trunk is unstable.
What version does the 3.01 branch use?
Darren - 4th September 2008 19:18 - #
We've actually been using svn:externals to share a single migrations directory between multiple branches for our projects, which has worked pretty well so far. We're careful not to have two branches that make conflicting database changes though.
If you need to push out a release that requires a specific database change, you can do that using the "./manage.py dmigrate apply 17" command - which lets you apply a specific migration without having to apply all of the preceding migrations. Of course, if there are dependencies on other migrations that have not yet been applied you'll need to figure those out yourself.
Tom: the principle reason to use migrations over just manually running SQL statements is when you're working on a large development team and everyone needs to keep their local copy of the database schema under control. Sure, you could just send an e-mail to the whole team saying "run this SQL command" but it's a whole lot easier to have an automated process that manages that for you.
Dude, looks really cool!
Alexander - 12th October 2008 15:54 - #
It indeed looks like a masterstroke. Awesome work!
@Florian, you are right. I also appreciate to have control over my works.
Billigflüge - 6th November 2008 12:50 - #
It was a good experience to read the articles and contents on this site. http://www.gujaratonnet.com
Shanky Jindal - 12th November 2008 06:29 - #