Simon Willison’s Weblog

Subscribe

Which is the best open source tool to populate my database with test data for my load test?

11th February 2012

My answer to Which is the best open source tool to populate my database with test data for my load test? on Quora

I’ve seen tools that do this, but to be honest it’s very simple to write your own script for this (especially if you’re using an ORM). The other benefit to writing your own script for this is that you’ll have a much better chance of accurately representing your expected data, sizes etc.

A couple of techniques that are pretty useful: Build up lists of common first names and last names, then generate user names by picking a random first name and a random last name. Build a utility function that generates 6 letter random strings, then generate email addresses as random-6-letter-string@random-domain. For relationships, one technique is to populate one table, then pull all of the primary keys out in to a list and pick them at random from that list when creating other records. You might want to bias that selection towards some records to get more of a realistic bell-curve rather than a purely random selection.

There are libraries that can help with this (e.g. built-in routines for generating fake email addresses etc). If you’re using Ruby, http://faker.rubyforge.org/ is worth a look (a port of Data::Faker from Perl). There’s a Python port here: https://github.com/threadsafelab...

This is Which is the best open source tool to populate my database with test data for my load test? by Simon Willison, posted on 11th February 2012.

Next: Tech Startups: What skills/technologies would the ideal technical co-founder possess?

Previous: How did slashes become the standard path separators for URLs?