Simon Willison’s Weblog

Node.js is genuinely exciting

I gave a talk on Friday at Full Frontal, a new one day JavaScript conference in my home town of Brighton. I ended up throwing away my intended topic (JSONP, APIs and cross-domain security) three days before the event in favour of a technology which first crossed my radar less than two weeks ago.

That technology is Ryan Dahl’s Node. It’s the most exciting new project I’ve come across in quite a while.

At first glance, Node looks like yet another take on the idea of server-side JavaScript, but it’s a lot more interesting than that. It builds on JavaScript’s excellent support for event-based programming and uses it to create something that truly plays to the strengths of the language.

Node describes itself as “evented I/O for V8 javascript”. It’s a toolkit for writing extremely high performance non-blocking event driven network servers in JavaScript. Think similar to Twisted or EventMachine but for JavaScript instead of Python or Ruby.

Evented I/O?

As I discussed in my talk, event driven servers are a powerful alternative to the threading / blocking mechanism used by most popular server-side programming frameworks. Typical frameworks can only handle a small number of requests simultaneously, dictated by the number of server threads or processes available. Long-running operations can tie up one of those threads—enough long running operations at once and the server runs out of available threads and becomes unresponsive. For large amounts of traffic, each request must be handled as quickly as possible to free the thread up to deal with the next in line.

This makes certain functionality extremely difficult to support. Examples include handling large file uploads, combining resources from multiple backend web APIs (which themselves can take an unpredictable amount of time to respond) or providing comet functionality by holding open the connection until a new event becomes available.

Event driven programming takes advantage of the fact that network servers spend most of their time waiting for I/O operations to complete. Operations against in-memory data are incredibly fast, but anything that involves talking to the filesystem or over a network inevitably involves waiting around for a response.

With Twisted, EventMachine and Node, the solution lies in specifying I/O operations in conjunction with callbacks. A single event loop rapidly switches between a list of tasks, firing off I/O operations and then moving on to service the next request. When the I/O returns, execution of that particular request is picked up again.

(In the talk, I attempted to illustrate this with a questionable metaphor involving hamsters, bunnies and a hyperactive squid).

What makes Node exciting?

If systems like this already exist, what’s so exciting about Node? Quite a few things:

  • JavaScript is extremely well suited to programming with callbacks. Its anonymous function syntax and closure support is perfect for defining inline callbacks, and client-side development in general uses event-based programming as a matter of course: run this function when the user clicks here / when the Ajax response returns / when the page loads. JavaScript programmers already understand how to build software in this way.
  • Node represents a clean slate. Twisted and EventMachine are hampered by the existence of a large number of blocking libraries for their respective languages. Part of the difficulty in learning those technologies is understanding which Python or Ruby libraries you can use and which ones you have to avoid. Node creator Ryan Dahl has a stated aim for Node to never provide a blocking API—even filesystem access and DNS lookups are catered for with non-blocking callback based APIs. This makes it much, much harder to screw things up.
  • Node is small. I read through the API documentation in around half an hour and felt like I had a pretty comprehensive idea of what Node does and how I would achieve things with it.
  • Node is fast. V8 is the fast and keeps getting faster. Node’s event loop uses Marc Lehmann’s highly regarded libev and libeio libraries. Ryan Dahl is himself something of a speed demon—he just replaced Node’s HTTP parser implementation (already pretty speedy due to it’s Ragel / Mongrel heritage) with a hand-tuned C implementation with some impressive characteristics.
  • Easy to get started. Node ships with all of its dependencies, and compiles cleanly on Snow Leopard out of the box.

With both my JavaScript and server-side hats on, Node just feels right. The APIs make sense, it fits a clear niche and despite its youth (the project started in February) everything feels solid and well constructed. The rapidly growing community is further indication that Ryan is on to something great here.

What does Node look like?

Here’s how to get Hello World running in Node in 7 easy steps:

  1. git clone git://github.com/ry/node.git (or download and extract a tarball)
  2. ./configure
  3. make (takes a while, it needs to compile V8 as well)
  4. sudo make install
  5. Save the below code as helloworld.js
  6. node helloworld.js
  7. Visit http://localhost:8080/ in your browser

Here’s helloworld.js:

var sys = require('sys'), 
  http = require('http');

http.createServer(function(req, res) {
  res.sendHeader(200, {'Content-Type': 'text/html'});
  res.sendBody('<h1>Hello World</h1>');
  res.finish();
}).listen(8080);

sys.puts('Server running at http://127.0.0.1:8080/');

If you have Apache Bench installed, try running ab -n 1000 -c 100 ’http://127.0.0.1:8080/’ to test it with 1000 requests using 100 concurrent connections. On my MacBook Pro I get 3374 requests a second.

So Node is fast—but where it really shines is concurrency with long running requests. Alter the helloworld.js server definition to look like this:

http.createServer(function(req, res) {
  setTimeout(function() {
    res.sendHeader(200, {'Content-Type': 'text/html'});
    res.sendBody('<h1>Hello World</h1>');
    res.finish();
  }, 2000);
}).listen(8080);

We’re using setTimeout to introduce an artificial two second delay to each request. Run the benchmark again—I get 49.68 requests a second, with every single request taking between 2012 and 2022 ms. With a two second delay, the best possible performance for 1000 requests 100 at a time is 1000 requests / (1000 / 100) * 2 seconds = 50 requests a second. Node hits it pretty much bang on the nose.

The most important line in the above examples is res.finish(). This is the mechanism Node provides for explicitly signalling that a request has been fully processed and should be returned to the browser. By making it explicit, Node makes it easy to implement comet patterns like long polling and streaming responses—stuff that is decidedly non trivial in most server-side frameworks.

djangode

Node’s core APIs are pretty low level—it has HTTP client and server libraries, DNS handling, asynchronous file I/O etc, but it doesn’t give you much in the way of high level web framework APIs. Unsurprisingly, this has lead to a cambrian explosion of lightweight web frameworks based on top of Node—the projects using node page lists a bunch of them. Rolling a framework is a great way of learning a low-level API, so I’ve thrown together my own—djangode—which brings Django’s regex-based URL handling to Node along with a few handy utility functions. Here’s a simple djangode application:

var dj = require('./djangode');

var app = dj.makeApp([
  ['^/$', function(req, res) {
    dj.respond(res, 'Homepage');
  }],
  ['^/other$', function(req, res) {
    dj.respond(res, 'Other page');
  }],
  ['^/page/(\\d+)$', function(req, res, page) {
    dj.respond(res, 'Page ' + page);
  }]
]);
dj.serve(app, 8008);

djangode is currently a throwaway prototype, but I’ll probably be extending it with extra functionality as I explore more Node related ideas.

nodecast

My main demo in the Full Frontal talk was nodecast, an extremely simple broadcast-oriented comet application. Broadcast is my favourite “hello world” example for comet because it’s both simpler than chat and more realistic—I’ve been involved in plenty of projects that could benefit from being able to broadcast events to their audience, but few that needed an interactive chat room.

The source code for the version I demoed can be found on GitHub in the no-redis branch. It’s a very simple application—the client-side JavaScript simply uses jQuery’s getJSON method to perform long-polling against a simple URL endpoint:

function fetchLatest() {
  $.getJSON('/wait?id=' + last_seen, function(d) {
    $.each(d, function() {
      last_seen = parseInt(this.id, 10) + 1;
      ul.prepend($('<li></li>').text(this.text));
    });
    fetchLatest();
  });
}

Doing this recursively is probably a bad idea since it will eventually blow the browser’s JavaScript stack, but it works OK for the demo.

The more interesting part is the server-side /wait URL which is being polled. Here’s the relevant Node/djangode code:

var message_queue = new process.EventEmitter();

var app = dj.makeApp([
  // ...
  ['^/wait$', function(req, res) {
    var id = req.uri.params.id || 0;
    var messages = getMessagesSince(id);
    if (messages.length) {
      dj.respond(res, JSON.stringify(messages), 'text/plain');
    } else {
      // Wait for the next message
      var listener = message_queue.addListener('message', function() {
        dj.respond(res, 
          JSON.stringify(getMessagesSince(id)), 'text/plain'
        );
        message_queue.removeListener('message', listener);
        clearTimeout(timeout);
      });
      var timeout = setTimeout(function() {
        message_queue.removeListener('message', listener);
        dj.respond(res, JSON.stringify([]), 'text/plain');
      }, 10000);
    }
  }]
  // ...
]);

The wait endpoint checks for new messages and, if any exist, returns immediately. If there are no new messages it does two things: it hooks up a listener on the message_queue EventEmitter (Node’s equivalent of jQuery/YUI/Prototype’s custom events) which will respond and end the request when a new message becomes available, and also sets a timeout that will cancel the listener and end the request after 10 seconds. This ensures that long polls don’t go on too long and potentially cause problems—as far as the browser is concerned it’s just talking to a JSON resource which takes up to ten seconds to load.

When a message does become available, calling message_queue.emit(’message’) will cause all waiting requests to respond with the latest set of messages.

Talking to databases

nodecast keeps track of messages using an in-memory JavaScript array, which works fine until you restart the server and lose everything. How do you implement persistent storage?

For the moment, the easiest answer lies with the NoSQL ecosystem. Node’s focus on non-blocking I/O makes it hard (but not impossible) to hook it up to regular database client libraries. Instead, it strongly favours databases that speak simple protocols over a TCP/IP socket—or even better, databases that communicate over HTTP. So far I’ve tried using CouchDB (with node-couch) and redis (with redis-node-client), and both worked extremely well. nodecast trunk now uses redis to store the message queue, and provides a nice example of working with a callback-based non-blocking database interface:

var db = redis.create_client();
var REDIS_KEY = 'nodecast-queue';

function addMessage(msg, callback) {
  db.llen(REDIS_KEY, function(i) {
    msg.id = i; // ID is set to the queue length
    db.rpush(REDIS_KEY, JSON.stringify(msg), function() {
      message_queue.emit('message', msg);
      callback(msg);
    });
  });
}

Relational databases are coming to Node. Ryan has a PostgreSQL adapter in the works, thanks to that database already featuring a mature non-blocking client library. MySQL will be a bit tougher—Node will need to grow a separate thread pool to integrate with the official client libs—but you can talk to MySQL right now by dropping in DBSlayer from the NY Times which provides an HTTP interface to a pool of MySQL servers.

Mixed environments

I don’t see myself switching all of my server-side development over to JavaScript, but Node has definitely earned a place in my toolbox. It shouldn’t be at all hard to mix Node in to an existing server-side environment—either by running both behind a single HTTP proxy (being event-based itself, nginx would be an obvious fit) or by putting Node applications on a separate subdomain. Node is a tempting option for anything involving comet, file uploads or even just mashing together potentially slow loading web APIs. Expect to hear a lot more about it in the future.

Further reading

This is Node.js is genuinely exciting by Simon Willison, posted on 23rd November 2009.

Tagged , , , , , , , , , , , , ,

Next: Crowdsourced document analysis and MP expenses

Previous: Why I like Redis