Which is the best open source tool to populate my database with test data for my load test?
11th February 2012
My answer to Which is the best open source tool to populate my database with test data for my load test? on Quora
I’ve seen tools that do this, but to be honest it’s very simple to write your own script for this (especially if you’re using an ORM). The other benefit to writing your own script for this is that you’ll have a much better chance of accurately representing your expected data, sizes etc.
A couple of techniques that are pretty useful: Build up lists of common first names and last names, then generate user names by picking a random first name and a random last name. Build a utility function that generates 6 letter random strings, then generate email addresses as random-6-letter-string@random-domain. For relationships, one technique is to populate one table, then pull all of the primary keys out in to a list and pick them at random from that list when creating other records. You might want to bias that selection towards some records to get more of a realistic bell-curve rather than a purely random selection.
There are libraries that can help with this (e.g. built-in routines for generating fake email addresses etc). If you’re using Ruby, http://faker.rubyforge.org/ is worth a look (a port of Data::Faker from Perl). There’s a Python port here: https://github.com/threadsafelab...
More recent articles
- Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson - 26th November 2025
- Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult - 24th November 2025
- sqlite-utils 4.0a1 has several (minor) backwards incompatible changes - 24th November 2025