Why I like Redis
I’ve been getting a lot of useful work done with Redis recently.
Redis is typically categorised as yet another of those new-fangled NoSQL key/value stores, but if you look closer it actually has some pretty unique characteristics. It makes more sense to describe it as a “data structure server”—it provides a network service that exposes persistent storage and operations over dictionaries, lists, sets and string values. Think memcached but with list and set operations and persistence-to-disk.
It’s also incredibly easy to set up, ridiculously fast (30,000 read or writes a second on my laptop with the default configuration) and has an interesting approach to persistence. Redis runs in memory, but syncs to disk every Y seconds or after every X operations. Sounds risky, but it supports replication out of the box so if you’re worried about losing data should a server fail you can always ensure you have a replicated copy to hand. I wouldn’t trust my only copy of critical data to it, but there are plenty of other cases for which it is really well suited.
I’m currently not using it for data storage at all—instead, I use it as a tool for processing data using the interactive Python interpreter.
I’m a huge fan of REPLs. When programming Python, I spend most of my time in an IPython prompt. With JavaScript, I use the Firebug console. I experiment with APIs, get something working and paste it over in to a text editor. For some one-off data transformation problems I never save any code at all—I run a couple of list comprehensions, dump the results out as JSON or CSV and leave it at that.
Redis is an excellent complement to this kind of programming. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that’s already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don’t have to think for more than a few seconds about how I’m going to represent my data.
Here’s a 30 second guide to getting started with Redis:
$ wget http://redis.googlecode.com/files/redis-1.01.tar.gz
$ tar -xzf redis-1.01.tar.gz
$ cd redis-1.01
$ make
$ ./redis-server
And that’s it—you now have a Redis server running on port 6379. No need even for a ./configure or make install. You can run ./redis-benchmark in that directory to exercise it a bit.
Let’s try it out from Python. In a separate terminal:
$ cd redis-1.01/client-libraries/python/
$ python
>>> import redis
>>> r = redis.Redis()
>>> r.info()
{u'total_connections_received': 1, ... }
>>> r.keys('*') # Show all keys in the database
[]
>>> r.set('key-1', 'Value 1')
'OK'
>>> r.keys('*')
[u'key-1']
>>> r.get('key-1')
u'Value 1'
Now let’s try something a bit more interesting:
>>> r.push('log', 'Log message 1', tail=True)
>>> r.push('log', 'Log message 2', tail=True)
>>> r.push('log', 'Log message 3', tail=True)
>>> r.lrange('log', 0, 100)
[u'Log message 3', u'Log message 2', u'Log message 1']
>>> r.push('log', 'Log message 4', tail=True)
>>> r.push('log', 'Log message 5', tail=True)
>>> r.push('log', 'Log message 6', tail=True)
>>> r.ltrim('log', 0, 2)
>>> r.lrange('log', 0, 100)
[u'Log message 6', u'Log message 5', u'Log message 4']
That’s a simple capped log implementation (similar to a MongoDB capped collection)—push items on to the tail of a ’log’ key and use ltrim to only retain the last X items. You could use this to keep track of what a system is doing right now without having to worry about storing ever increasing amounts of logging information.
See the documentation for a full list of Redis commands. I’m particularly excited about the RANDOMKEY and new SRANDMEMBER commands (git trunk only at the moment), which help address the common challenge of picking a random item without ORDER BY RAND() clobbering your relational database. In a beautiful example of open source support in action, I requested SRANDMEMBER on Twitter yesterday and antirez committed just 12 hours later.
I used Redis this week to help create heat maps of the BNP’s membership list for the Guardian. I had the leaked spreadsheet of the BNP member details and a (licensed) CSV file mapping 1.6 million postcodes to their corresponding parliamentary constituencies. I loaded the CSV file in to Redis, then looped through the 12,000 postcodes from the membership and looked them up in turn, accumulating counts for each constituency. It took a couple of minutes to load the constituency data and a few seconds to run and accumulate the postcode counts. In the end, it probably involved less than 20 lines of actual Python code.
A much more interesting example of an application built on Redis is Hurl, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. The code is now open source, and Chris talks a bit more about the implementation (in particular their use of sort in Redis) on his blog. Redis also gets a mention in Tom Preston-Werner’s epic writeup of the new scalable architecture behind GitHub.
Hi Simon - thanks for taking the time to write this all up. I'm always curious to hear about how other people work, especially with data.
BTW, for any Twisted fans, a Twisted interface to Redis was posted to the Twisted mailing list just yesterday.
This post pushes my buttons for a few reasons.
First, the availablility of persistent data over multiple Python sessions is really nice (this may be implemented elsewhere, but my first though was to automatically insert everything in an interpreter session into Redis automatically, and restore a session quickly, and implement Screen-like functionality for my Python prompts).
Secondly, I built the original bnpnearme.co.uk site last year (sadly now defunct, but covered here: http://bit.ly/3k8aIu), and I remember processing the data using a really naive PHP script took a LONG time, so kudos.
Jon Atkinson - 22nd October 2009 13:10 - #
Hey Simon,
Thanks for this writeup. I'm definitely going to give redis a try.
The heatmap was an excellent example. What did you use to produce it? I'm guessing matplotlib.
Keep up the good work!
Hrishi Mittal - 22nd October 2009 14:24 - #
Hi Simon - I don't know if you've played with Tokyo Cabinet or CouchDB but if so, what do you see as redis' features that make it stand out from the crowd?
Tim Parkin - 22nd October 2009 14:41 - #
why I like this post...because I was looking for something like redis...easy to setup, fast, client libraries for scala and "New BSD License". Looks good...I'll give it a try.
Andreas Schipplock - 22nd October 2009 15:00 - #
Hey, cool little project. Reminds me a lot of object prevalence; something I have had various levels of interest in at different points in time.
There are some great prevalence packages for other languages (KiokuDB for Perl, Rucksack for Common Lisp) but I haven't really found a solid stable one for Python (and never got around to finishing one myself).
Good work. Can't wait to hear more news. :)
j_king - 22nd October 2009 15:01 - #
Hrishi: I actually used an SVG image with IDs on the paths representing each constituency, and looped through that using ElementTree to rewrite the fill attribute with a programatically selected colour.
Tim: I've tried both. Redis feels like a very different kind of product. For my purposes, it's easier to use than CouchDb (mainly because it's so much simpler) and a bit more flexible than Tokyo due to the set/list operations. I'd certainly consider Tokyo if I needed fast disk-backed key/value storage which wasn't going to fit in RAM, and I'd go for Couch if I wanted offline abilities and synchronisation. Redis seems to fits its own niche.
how interesting... thanks for writing this up. I like to spend a lot of my time in interpreters while programming also, and this use of Redis sounds really productive.
Just FYI, Lightcloud ( http://opensource.plurk.com/LightCloud/ ) supports both Tokyo Tyrant and Redis.
M McClellan - 22nd October 2009 16:16 - #
Hey Simon,
I agree. Redis is cool and we're staring to deploy it a bit at Craigslist too.
FWIW, I wrote about Redis a little wihle back for Linux Magazine:
http://www.linux-mag.com/cache/7496/1.html
Jeremy
Jeremy Zawodny - 22nd October 2009 23:32 - #
Hi,
On the visualisation part, we've used svgweb to display the map and process the colors in the client directly.
http://www.mepwatch.eu/country
Is the svg map available ?
I'd like to play with it ;)
Xavier - 23rd October 2009 07:32 - #
Ever tried ZODB? With some simple rules kept it *IS* the transparent persistency for python. Well, python only ;-)
http://www.zope.org/Members/adytumsolutions/HowToL oveZODB_PartI
adamg - 23rd October 2009 18:34 - #
Redis is interesting however I just started reading into riak (http://riak.basho.com/) and it seems to be a bit of a better implementation. I have been using couchDB for a while and really like the map/reduce implementation. It seems both redis and riak are better design to scale.
"""Hrishi: I actually used an SVG image with IDs on the paths representing each constituency, and looped through that using ElementTree to rewrite the fill attribute with a programatically selected colour."""
Curious about this - would you know a link to this kind of thing in more detail, or be willing to expand on it a bit, please?
Simon B - 5th November 2009 16:26 - #
I just read your re-introduction to JavaScript and wanted to say thanks! It's excellent.
(Probably posting in the wrong place; sorry about that.)
Jeff Brown - 17th November 2009 18:49 - #
@Simon B: not sure about Simon’s implementation, but there’s a nice discussion of programmatically manipulating SVG files (via the example of a map of American counties) here:
http://flowingdata.com/2009/11/12/how-to-make-a-us -county-thematic-map-using-free-tools/
"I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table)"
What do you use to read JSON into Redis from Python, in a *streaming* fashion?