Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Rate limiting with memcached

On Monday, several high profile “celebrity” Twitter accounts started spouting nonsense, the victims of stolen passwords. Wired has the full story—someone ran a dictionary attack against a Twitter staff member, discovered their password and used Twitter’s admin tools to reset the passwords on the accounts they wanted to steal.

The Twitter incident got me thinking about rate limiting again. I’ve been wanting a good general solution to this problem for quite a while, for API projects as well as security. Django Snippets has an answer, but it works by storing access information in the database and requires you to run a periodic purge command to clean up the old records.

I’m strongly averse to writing to the database for every hit. For most web applications reads scale easily, but writes don’t. I also want to avoid filling my database with administrative gunk (I dislike database backed sessions for the same reason). But rate limiting relies on storing state, so there has to be some kind of persistence.

Using memcached counters

I think I’ve found a solution, thanks to memcached and in particular the incr command. incr lets you atomically increment an already existing counter, simply by specifying its key. add can be used to create that counter—it will fail silently if the provided key already exists.

Let’s say we want to limit a user to 10 hits every minute. A naive implementation would be to create a memcached counter for hits from that user’s IP address in a specific minute. The counter key might look like this:

ratelimit_72.26.203.98_2009-01-07-21:45

Increment that counter for every hit, and if it exceeds 10 block the request.

What if the user makes ten requests all in the last second of the minute, then another ten a second later? The rate limiter will let them off. For many cases this is probably acceptable, but we can improve things with a slightly more complex strategy. Let’s say we want to allow up to 30 requests every five minutes. Instead of maintaining one counter, we can maintain five—one for each of the past five minutes (older counters than that are allowed to expire). After a few minutes we might end up with counters that look like this:

ratelimit_72.26.203.98_2009-01-07-21:45 = 13
ratelimit_72.26.203.98_2009-01-07-21:46 = 7
ratelimit_72.26.203.98_2009-01-07-21:47 = 11

Now, on every request we work out the keys for the past five minutes and use get_multi to retrieve them. If the sum of those counters exceeds the maximum allowed for that time period, we block the request.

Are there any obvious flaws to this approach? I’m pretty happy with it—it cleans up after itself (old counters quietly expire from the cache), it shouldn’t use much resources (just five active cache keys per unique IP address at any one time) and if the cache is lost the only snag is that a few clients might go slightly over their rate limit. I don’t think it’s possible for an attacker to force the counters to expire early.

An implementation for Django

I’ve put together an example implementation of this algorithm using Django, hosted on GitHub. The readme.txt file shows how it works—basic usage is via a simple decorator:

from ratelimitcache import ratelimit

@ratelimit(minutes = 3, requests = 20)
def myview(request):
    # ...
    return HttpResponse('...')

Python decorators are typically functions, but ratelimit is actually a class. This means it can be customised by subclassing it, and the class provides a number of methods designed to be over-ridden. I’ve provided an example of this in the module itself—ratelimit_post, a decorator which only limits on POST requests and can optionally couple the rate limiting to an individual POST field. Here’s the complete implementation:

class ratelimit_post(ratelimit):
    "Rate limit POSTs - can be used to protect a login form"
    key_field = None # If provided, this POST var will affect the rate limit
    
    def should_ratelimit(self, request):
        return request.method == 'POST'
    
    def key_extra(self, request):
        # IP address and key_field (if it is set)
        extra = super(ratelimit_post, self).key_extra(request)
        if self.key_field:
            value = sha.new(request.POST.get(self.key_field, '')).hexdigest()
            extra += '-' + value
        return extra

And here’s how you would use it to limit the number of times a specific IP address can attempt to log in as a particular user:

@ratelimit_post(minutes = 3, requests = 10, key_field = 'username')
def login(request):
    # ...
    return HttpResponse('...')

The should_ratelimit() method is called before any other rate limiting logic. The default implementation returns True, but here we only want to apply rate limits to POST requests. The key_extra() method is used to compose the keys used for the counter—by default this just includes the request’s IP address, but in ratelimit_post we can optionally include the value of a POST field (for example the username). We could include things like the request path here to apply different rate limit counters to different URLs.

Finally, the readme.txt includes ratelimit_with_logging, an example that over-rides the disallowed() view returned when a rate limiting condition fails and writes an audit note to a database (less overhead than writing for every request).

I’ve been a fan of customisation via subclassing ever since I got to know the new Django admin system, and I’ve been using it in a bunch of projects. It’s a great way to create reusable pieces of code.

This is Rate limiting with memcached by Simon Willison, posted on 7th January 2009.

Tagged , , , , , , , , ,

View blog reactions

Next: Pragmatism, purity and JSON content types

Previous: DjangoCon and PyCon UK

32 comments

  1. By a fabulous coincidence, this may be exactly the snippet of code my gang needs for an authentication service we'll be rewriting. Thank you!

    Yoz - 7th January 2009 22:45 - #

  2. This is very nice -- I'm excited to try it in my own projects.

    One thing you might point out in the readme is that it's more flexible (and, I would argue, better practice) to use the decorator in the URLconf than hard-coding it to the view itself. An argument could be made that the rate-limiting is a per-installation configuration (and would thus fit more logically in the URLconf) -- and leaving views unadorned makes it cleaner to do unit tests.

    But the nice thing about the way you've designed this, of course, is that it works either way. Great stuff!

    Adrian Holovaty - 7th January 2009 22:51 - #

  3. This is very nice. I like the clean API.

    Eric Florenzano - 7th January 2009 22:53 - #

  4. Adrian: good idea, I've added an example to the readme (showing how you would use it with django.contrib.auth.views.login).

    Simon Willison - 7th January 2009 23:07 - #

  5. Cool, I really like subclassing as a way to change the default behaviour slightly. That's nifty code, thanks for sharing. Looking forward to use it in Pinax :)

    Jannis Leidel - 7th January 2009 23:15 - #

  6. I've had success with using swatch for this exact case. I also use it to watch my SSH logins. http://swatch.sourceforge.net/.

    Nolan Caudill - 7th January 2009 23:24 - #

  7. Hi,

    this is a very old problem, with a solutions attempted many times in the past. So you can learn from them too :)

    A few notes...

    - proxies. 1 ip can equal 1 million users. So you may need to look at other things than just ip address. http headers can be useful(like http agent, cookies etc).

    - ip ranges. An attacker might have access to a whole class C of ip addresses. Which means they can use 255 different ip addresses for the attack. Also attackers use botnets... with possibly 10000 different ip addresses working together to brute force attacks.

    - you need a way to administrate blocks, and have white lists/black lists. To remove false alerts and other stuff. 'cause at some point the wrong person gets blocked, and you'll need to be able to remove them.

    - blocking at the firewall can be good. So have a way to export your list of blocked ip addresses to either /etc/hosts.deny, iptables, pf, cisco rules etc.

    - making every login slow, without using up many resources can help a lot. Since it's usually a one time thing - that slowness does not matter so much. By using very little resources to do the slow login, you can try and help sure you don't get a DOS. However this is not a generic solution to slowing down all website access - just things like login that happen once in a while.

    cu!

    Rene Dudfield - 7th January 2009 23:32 - #

  8. Funnily enough, that's how we do rate limiting at Twitter. We have a generic memcached-backed Limiter class that handles all sorts of rate limiting, from the API to regulating the number of messages users can send, and now, login attempts.

    Alex Payne - 7th January 2009 23:33 - #

  9. Pretty slick idea. I like it quite a bit.

    Jeff - 7th January 2009 23:39 - #

  10. Re twitter (because I need more space, 140 chars is not enough for my bad English ;-)): let's say I take top 10 worst passwords from this list: http://www.whatsmypass.com/?p=415

    Now I can iterate over all known usernames available (for instance in twitter sidebar) and try those passwords even with @ratelimit_post(minutes = 3, requests = 10, key_field = 'username') decorator set.

    Or I miss something?

    David, biologeek - 8th January 2009 00:03 - #

  11. It's even easier. Rather than creating a key with the time in it, you can just create a key which expires after the limit time. It still has the "2 time periods" problem, but to a much lesser extent than just using the current time.
    #!/usr/bin/perl
    use Cache::Memcached;
    
    my $exp = 10;
    my  $memd = new Cache::Memcached {
        'servers' => [ "127.0.0.1:11211" ],
        'debug' => 0,
      };
    
    while(true) {
      my $count = incr();
      print "$count\n" if ($count < 5);
      sleep(1);
    }
    
    sub incr {
      $memd->add("myKey", 0, $exp);
      return $memd->incr("myKey");
    }
    
    

    David Sheldon - 8th January 2009 00:13 - #

  12. David biologeek: you're right, the ratelimit_post decorator doesn't protect against an attack against many users. For that you would need to roll another ratelimit subclass. You can nest decorators so you could limit heavily against the same username and more lightly against all Post requests from a single IP if you wanted.

    David Sheldon: I hadn't thought of that - that's a neat trick for taking advantage of the fact that you set the expire time on the call to add, not the call to incr. I think the multiple counter approach gives a better approximation of a rolling average, but the simpler approach would be just as good for most cases.

    Simon Willison - 8th January 2009 00:24 - #

  13. There's a show stopper issue with this approach

    Scaling: memcached doesn't automatically distribute its data across multiple machines. The rate limiting information can't exceed the available memory on the machine. Also each webserver box will have to make an off box call to determine whether a given ip should be throttled.

    Typical solutions are to hash (statically or dynamically) to a fleet of memcached boxes. The difficulty there is dealing with hard failure and network blips and partitions.

    Anon - 8th January 2009 00:59 - #

  14. @Anon, wha?

    memcache clients do generally hash keys to servers. The naive approach is:
    server = servers[hash % num_servers]

    I agree that multiple memcached nodes can fail or partition, but, uh, do you have a solution that has all three of CAP? As long as the attacker can't force a memcached node failure, a missed key will just go to another of the memcached nodes, and then limit as normal.

    Jeremy Dunck - 8th January 2009 07:10 - #

  15. I'm doing a firewall plugin for Rails and this has given me some ideas - thanks! One thing that concerns me is that while this would work great for throttling some API service based on keys, it presumably doesn't help much with stopping dictionary attacks from bots. Won't people running such attacks mask their IP, send no cookies, or distribute their requests across machines/processes? Would this make any kind of per-user throttling ineffective as a defense, or am I missing something?

    James Coglan - 8th January 2009 08:29 - #

  16. James: I don't think there's a nice solution to that. You could throttle based just on submitted username (which my class supports, simply implement a key_extra method that just uses the POST['username'] variable) - but doing so opens you up to denial of service attempts, where an attacker can deliberately lock an account by hitting it multiple times.

    The only solution I can think of for attacks from bots is smarter monitoring - trying to detect suspicious traffic and alert an administrator, who can then make a human judgement as to whether the attack is attempting to guess a password or just lock out someone's account. Anyone got any better ideas?

    Simon Willison - 8th January 2009 09:31 - #

  17. This happens because people don't care about security.

    In addition, almost all frameworks that have implemented an authentication system as Django use hash algorithms outdated, as MD5 and SHA1, which have known hash collision weaknesses. So, the developers think that they are using a system very secure and then these things happen.

    A better alternative for password hashing is the bcrypt algorithm, used in OpenBSD.

    http://pypi.python.org/pypi/bcryptWrap/

    Jonas Melian - 8th January 2009 09:50 - #

  18. @Simon:
    Brilliant idea, no Django site should be without it :)

    @Jonas Melian:
    All the hashing in the world won't protect you against weak user passwords and brute force attacks. You also need something to prevent brute force attacks.

    Your hashing algorithm is only important if the hacker should gain access you where you store the hashes. Which is not the case here.

    Mikkel Høgh - 8th January 2009 10:27 - #

  19. As Rene has noted, you do need to be careful with IP rate limiting, particularly with institutions using a common proxy server.

    When implementing my own solution to this I've tended to also add a white & blacklisting option to deal with known cases. A useful option is to be able to run the system in a "warning" mode so that instead of blocking/throttling requests, you warn/log breaches. This gives a good way to measure the effectiveness of the code, investigate cases where white-listing may be required, etc.

    Checking the X-Forwarded-For header, if set, and using the IP address there rather than the request IP is one option for handling proxied requests.

    I've found that adequately dealt with a scenario where an attack was being routed via several compromised proxy servers. Without this the rate of attack wasn't enough to trigger blocking.

    Nice implementation though!

    Leigh Dodds - 8th January 2009 13:18 - #

  20. @Leigh Dodds:

    Checking the X-Forwarded-For header, if set, and using the IP address there rather than the request IP is one option for handling proxied requests.

    Be careful there; the X-Forwarded-For header could contain anything. If the attacker figures out you're relying on that instead of the IP the connection is coming from, all they have to do is send an X-Forwarded-For header with a random IP address every time, and they can make as many requests as they want, without even needing proxies.

    David Precious - 8th January 2009 15:16 - #

  21. "I don't think it's possible for an attacker to force the counters to expire early."

    With a botnet they can. Presumably it would be possible to flood memcached and force early expiration of your counters with a ton of unique IPs.

    One protection against flooding would be to have another cache value keyed on the username that stores the last x (2 probably. Maybe 3) IP addresses used to try to log into that account, with a short expiration time. If a new IP comes in, that list is full, and the IP isn't on the list, then that username is being attacked by a flood, and you refuse the attempt without even going to the throttling layer.

    This does, as you say, allow DoS attacks by letting a botnet prevent a user from logging in, however if you store a 'privileged' IP address, which is the last IP address that successfully logged into the account, and always allow that one through (probably even skipping the throttling code), I think you'd get 95% of scenarios. . enough that it probably wouldn't be worth it for a botnet master to try to DoS you out of your account with login attempts, since the probability of success is very low.

    There's another weakness in that it depends on the flood hitting the same username. You could potentially have a botnet flood with random usernames, which would be useless in itself for cracking, but would overflow your memcached and cause early expiration, which would open the door to a concurrent focused flood on a single username. But that's pretty arcane.

    (addendum: your django commenting system is telling me 'Text is not allowed inside blockquote'. Unless someone went and changed XHTML on me, I'm pretty sure blockquote is useless if you can't put text in it. Also, requiring manual paragraph formatting in XHTML mode is a pain in the ass, though I understand it makes sense to an engineer [you're either doing all the XHTML or you're not]. Most commenting systems have an XHTML + auto-paragraph mode)

    sidereal - 8th January 2009 19:02 - #

  22. sidereal: sadly, XHTML requires the contents of a blockquote are wrapped in another block level element such as a paragraph - I don't make the rules, I just enforce them!

    Love the idea of saving privileged IP addresses. Might be worth storing several, to increase the chance of getting both home and work addresses.

    Simon Willison - 8th January 2009 20:00 - #

  23. Incidentally, this is pretty much what we do at Skyrock.com (IIRC, we didn't even have to add memcached servers to our farm), only we don't limit ourselves to login attempts: we also use rate limiting to combat actual spammers' usage patterns on the platform.

    (Totally unrelated matter: in Safari 3.2.1, when the focus is on the comment box and I scroll the page, the left part of the focused box's blue halo bleeds onto the comments. Perhaps if you allow a bigger margin between the comments and the new comment div, or more padding in this div, the issue would go away?)

    Michel Valdrighi - 8th January 2009 21:03 - #

  24. For anyone who might be interested, I've implemented throttling in my Rails firewall plugin:

    http://github.com/jcoglan/consent

    Throttling docs are toward the end of the REAMDE. You can use any request data you like to categorise traffic so it's pretty flexible. Thanks for the idea, Simon!

    James Coglan - 8th January 2009 23:18 - #

  25. I'm curious as to why one wouldn't simply rate-limit at the edge, where there's less intelligence required? app-level intelligence is both slow and expensive; most modern load balancers (including free software e.g. OpenBSD's relayd(8)) can rate limit all the way up to layer 7 (e.g. "pass traffic normally, except for requests to login.cgi?$token where we will allow x requests in y seconds for the same unique token per IP address").

    Why wouldn't you just rate limit at the edge, where you can take advantage of logic running on hardware ASICs instead of further inside the network where your app servers have to get involved?

    Scott Francis - 9th January 2009 00:58 - #

  26. For API requests and similiar stuff it is a good solution.
    But it is not elegant solution for preventing login attacks.
    You are doing checks at the wrong end.

    Do not design rules for login requests. Design rules for failed login requests.

    Deny access when you have 10 failed login requests in a minute from the same IP.
    Deny access when you have 10 failed login requests in a minute for some username.
    Add sleep time to login requests when you have more then 50 failed login requests in a minute.
    Notify admin about login attack and block all login requests for 5 minutes when you have more than 150 failed logins in a minute.
    And so on...

    Algimantas - 10th January 2009 11:18 - #

  27. @Scott
    Is there any way for the app to feed qualifications to the load balancers (like relayd)?

    As an example, rate limit to x per second, unless from IPs a,b,c (which are recently used for a specific user)?

    I assume online alteration of the balancer limiter is fully-supported?

    Jeremy Dunck - 13th January 2009 02:32 - #

  28. I have to agree with Algimantas - as much as I love the idea of using memcached for this (or for anything, really), I would design throttling rules for failed login requests only, and store both a 'number_of_failed_logins' and a 'last_failed_login' timestamp in the user table.

    Because I store just the failed attempts, I'm avoiding the 'one write instruction per hit' overhead.

    However, I wouldn't deny access after 10 failed login attempts - that only invites DoS attacks. Instead, I would add a 20-second delay after the first 3 failed login attempts, to make dictionary attacks infeasible.

    Jens Roland - 20th January 2009 08:39 - #

  29. Another idea might to do what google and other does. After a few missed attempts they throw a CAPTCHA at you. Would give some limit to how fast a brute force attack can go. Although given that "The going rate, for high-volume buyers, seems to be about $0.002 per CAPTCHA solved." Perhaps it's not such a good idea anyway...

    http://www.privacydigest.com/2008/09/02/cheap%2Bca ptcha%2Bsolving%2Bchanges%2Bsecurity%2Bgame

    Another idea - 2nd February 2009 15:31 - #

  30. Hey Simon, perfect work, thanks!

    I've followed the details and tried to use it on login view, but it seems there's a bug on view_wrapper().

    on if sum(counts) >= self.requests:

    counts is a list of str's, so sum gives a TypeError. A quick solution for me was instead of

    counts = self.get_counters(request).values()

    counts = map(int, self.get_counters(request).values())

    which seems to work (after a few tries, at least!)

    Markos Gogoulos - 9th February 2009 11:40 - #

  31. Hey Simon, this is exactly what I was looking for... thanks for the implementation!

    I did have a couple of additions, however. First off, since I'm using this to limit on my login page, I found it useful to clear the counters after a successful login. This was accomplished with the following additional method on the ratelimit class:

        def clear_counters(self, request):
            for key in self.keys_to_check(request):
                cache.delete(key)

    And a call of: ratelimit_post(minutes=3).clear_counters(request) Perhaps not optimal, but it does work anyhow.

    My only other addition is in fact a small bugfix. While the locmem:// memory backend worked perfectly well, when using memcached:// through the python-memcached library, I was getting a TypeError on line 35 because the backend was returning strings instead of integers. I fixed this by replacing the line with:

        if sum([int(x) for x in counts]) >= self.requests:

    Other than that, great work! Thanks again!

    Joey Wilhelm - 19th March 2009 22:12 - #

  32. Great work Simon! I have extended this and implemented a rate limiter that is not tied up to Django. For the code and info, please visit: http://amix.dk/blog/viewEntry/19425

    amix - 5th April 2009 23:50 - #

Sign in with OpenID

Auto-HTML: Line breaks are preserved; URLs will be converted in to links.

Manual XHTML: Enter your own, valid XHTML. Allowed tags are a, p, blockquote, ul, ol, li, dl, dt, dd, em, strong, dfn, code, q, samp, kbd, var, cite, abbr, acronym, sub, sup, br, pre

A django site