dogproxy. Another of my experiments with Node.js—this is a very simple HTTP proxy which addresses the dog pile effect (also known as the thundering herd) by watching out for multiple requests for a URL that is currently “in flight” and bundling them together.
Has it occurred to you that this is useful for more than just (normal) caching?
2 different clients could request the same in-flight response for normal HTTP service, too. As long as it's for GET and you respect the Vary header, tying the responses together should be fine, no?
I don't know anything about node.js, but I think L27 here has a race between checking and adding completes?
http://github.com/simonw/dogproxy/blob/master/dogp roxy.js#L27
Jeremy Dunck - 3rd February 2010 14:32 - #
Yup, caching is just the most obvious example of what you could use this for.
I've been wondering about race conditions myself. I don't think there's one, because Node is essentially single threaded - so the check for existence and the insertion should happen in the same atomically executed block of code. Hopefully a Node expert will set me straight if that's not the case.
I think you can accomplish this without explicitly collecting requests on in_flight--you can use an events.EventEmitter object instead. When a req comes in, first do in_flight.addListener(req.url, function(status, content_type, body) { res.sendBody(...); }). Then, dispatch the request. Upon successful response, instead of looping through in_flight, just do in_flight.emit(req.url, status, content_type, body), and all the appropriate listeners will pick it up.
Thinking about that a bit more, there is a problem with the event-listener based approach that I don't see how to fix. I thought it would be possible for the request handler to figure out whether to dispatch a real request or not (instead of just adding a listener) by checking whether in_flight.listeners(req.url)
was empty or not, but this doesn't do the right thing if you're in the middle of sending responses for a URL when a new request comes in... dogproxy has a similar problem: if you're in the middle of sending responses out when a new request for the same URL comes in, the delete in_flight[url] (line 47) will trample the callback for the new request.
Michael, isn't that the same race condition Jeremy was wondering about, above?
Assuming node.js is single-threaded (which I think it is) then such race conditions can't happen. That's pretty much the point of using an event-based server and part of what makes it an elegant solution.
Tom Carden - 4th February 2010 17:42 - #
@Tom No, it's slightly different. I don't actually think there is a race condition between 27 and 28; nothing that could involve an event happens there, so those lines will always be executed sequentially. However, I think there is a problem between lines 43 and 46.