Simon Willison’s Weblog

Subscribe

Why JSON isn’t just for JavaScript

20th December 2006

Dave Winer’s discovery of JSON (and shock that “it’s not even XML”) has triggered an interesting discussion thread, on his blog and elsewhere. Plenty of people have re-assured him (and themselves) that it’s only used for JavaScript—it’s convenient in the browser but irrelevant elsewhere.

That simply isn’t true. Let’s look at the problem JSON solves:

I have a data structure on server / platform / programming environment A. I want to use it on server / platform / programming environment B.

Surely that problem has been solved a hundred times before? That’s what XML’s for, right?

Here’s an example data structure, of the kind you might want to transmit from one place to another (represented as a Python dictionary; mentally replace with the syntax from your programming language of choice).

person = {
  "name": "Simon Willison",
  "age": 25,
  "height": 1.68,
  "urls": [
    "http://simonwillison.net/",
    "http://www.flickr.com/photos/simon/",
    "http://simon.incutio.com/"
  ]
}

It’s a simple example, but it demonstrates the core data structures that any modern dynamic language is likely to support: strings, integers, floating points, hashes (or dictionaries or associative arrays depending on terminology) and lists (or sequences or arrays).

So how do I represent this in a language-neutral format? Obviously I can use XML. I could invent my own custom representation:

<person>
  <name>Simon Willison</name>
  <age>25</age>
  <height>1.68</height>
  <urls>
    <url>http://simonwillison.net/</url>
    ...
  </urls>
</person>

But this means writing a bunch of boilerplate code. I need code to build the XML format, and code to parse it (using SAX or DOM or whatever) at the other end. I need to write this code in every language/platform that I want to communicate with. That’s a lot of extra effort!

So let’s be smarter about this and reuse an existing format. If there’s an XML-based standard for the exact data I’m representing I could use that; if there are good libraries supporting it for the languages that I’m using I don’t have to write any extra parsing code.

If that isn’t the case, I need to look at XML standards that can represent our standard data types. As Dave points out, our principle options here are SOAP and XML-RPC.

By most accounts SOAP is a total pain to get anything done with, so let’s look at XML-RPC. XML-RPC encodes the exact data types we want, but wraps them up in a RPC mechanism. I have no need for any of the <methodCall> / <methodResponse> boiler plate; I just want to represent the data, not specify a remote method to call.

So I could use a subset of XML-RPC that’s just the data representation—in fact, that’s really not a bad idea. But then I’d have to figure out how to get the various XML-RPC libraries to parse just the data structures without barfing at the lack of an envelope.

I also face the problem that XML-RPC isn’t particularly human-readable. In fact, XML itself tends not to be brilliantly readable: the data gets lost among the angle brackets.

Enter JSON. The smartest thing about JSON is that it addresses the need for a light-weight standard for representing those core data types and, rather than inventing a new one, uses a subset of an existing one: ECMAScript, more commonly known as JavaScript.

JavaScript has excellent syntax for both object and array literals (something Java could certainly learn from); remember, in JavaScript an object is basically an associative array. JSON takes that syntax and makes it generally applicable. Because JavaScript is a programming language, JSON syntax is naturally readable and writable by human beings.

If you take away the assignment statement the Python example I gave earlier is also valid JSON. JSON is also a subset of YAML, an earlier attempt at a human readable/writable serialization format. More importantly, JSON libraries almost certainly exist for your language of choice (there are 24 languages represented in the list on JSON.org). Most of them provide two functions—json_encode($data) and json_decode($json) —and that’s all you need.

The sweet spot for JSON is serializing simple data structures for transfer between programming languages. If you need more complex data structures (maybe with some kind of schema for validation), use XML. If you want to do full blown RPC use SOAP or XML-RPC. If you just want a light-weight format for moving data around, JSON fits the bill admirably.

What do we lose from not using XML? The ability to use XML tools. If you’re someone who breathes XSLT that might be a problem; if like me your approach when faced with XML is to parse it in to a more agreeable data structure as soon as possible you’ll find JSON far more productive.