Why JSON isn’t just for JavaScript
Dave Winer’s discovery of JSON (and shock that “it’s not even XML”) has triggered an interesting discussion thread, on his blog and elsewhere. Plenty of people have re-assured him (and themselves) that it’s only used for JavaScript—it’s convenient in the browser but irrelevant elsewhere.
That simply isn’t true. Let’s look at the problem JSON solves:
I have a data structure on server / platform / programming environment A. I want to use it on server / platform / programming environment B.
Surely that problem has been solved a hundred times before? That’s what XML’s for, right?
Here’s an example data structure, of the kind you might want to transmit from one place to another (represented as a Python dictionary; mentally replace with the syntax from your programming language of choice).
person = {
"name": "Simon Willison",
"age": 25,
"height": 1.68,
"urls": [
"http://simonwillison.net/",
"http://www.flickr.com/photos/simon/",
"http://simon.incutio.com/"
]
}
It’s a simple example, but it demonstrates the core data structures that any modern dynamic language is likely to support: strings, integers, floating points, hashes (or dictionaries or associative arrays depending on terminology) and lists (or sequences or arrays).
So how do I represent this in a language-neutral format? Obviously I can use XML. I could invent my own custom representation:
<person>
<name>Simon Willison</name>
<age>25</age>
<height>1.68</height>
<urls>
<url>http://simonwillison.net/</url>
...
</urls>
</person>
But this means writing a bunch of boilerplate code. I need code to build the XML format, and code to parse it (using SAX or DOM or whatever) at the other end. I need to write this code in every language/platform that I want to communicate with. That’s a lot of extra effort!
So let’s be smarter about this and reuse an existing format. If there’s an XML-based standard for the exact data I’m representing I could use that; if there are good libraries supporting it for the languages that I’m using I don’t have to write any extra parsing code.
If that isn’t the case, I need to look at XML standards that can represent our standard data types. As Dave points out, our principle options here are SOAP and XML-RPC.
By most accounts SOAP is a total pain to get anything done with, so let’s look at XML-RPC. XML-RPC encodes the exact data types we want, but wraps them up in a RPC mechanism. I have no need for any of the <methodCall> / <methodResponse> boiler plate; I just want to represent the data, not specify a remote method to call.
So I could use a subset of XML-RPC that’s just the data representation—in fact, that’s really not a bad idea. But then I’d have to figure out how to get the various XML-RPC libraries to parse just the data structures without barfing at the lack of an envelope.
I also face the problem that XML-RPC isn’t particularly human-readable. In fact, XML itself tends not to be brilliantly readable: the data gets lost among the angle brackets.
Enter JSON. The smartest thing about JSON is that it addresses the need for a light-weight standard for representing those core data types and, rather than inventing a new one, uses a subset of an existing one: ECMAScript, more commonly known as JavaScript.
JavaScript has excellent syntax for both object and array literals (something Java could certainly learn from); remember, in JavaScript an object is basically an associative array. JSON takes that syntax and makes it generally applicable. Because JavaScript is a programming language, JSON syntax is naturally readable and writable by human beings.
If you take away the assignment statement the Python example I gave earlier is also valid JSON. JSON is also a subset of YAML, an earlier attempt at a human readable/writable serialization format. More importantly, JSON libraries almost certainly exist for your language of choice (there are 24 languages represented in the list on JSON.org). Most of them provide two functions—json_encode($data) and json_decode($json) —and that’s all you need.
The sweet spot for JSON is serializing simple data structures for transfer between programming languages. If you need more complex data structures (maybe with some kind of schema for validation), use XML. If you want to do full blown RPC use SOAP or XML-RPC. If you just want a light-weight format for moving data around, JSON fits the bill admirably.
What do we lose from not using XML? The ability to use XML tools. If you’re someone who breathes XSLT that might be a problem; if like me your approach when faced with XML is to parse it in to a more agreeable data structure as soon as possible you’ll find JSON far more productive.
Really interesting you post this now - in the market again for a human-readable data exchange format at the moment, for doing offline log analysis and been umming an erring between all you have here. Kept thinking "well there's JSON" but "nah..." for no justifiable.. think I'm convinced.
Harry Fuecks - 21st December 2006 00:19 - #
WDDX is an XML standard for representing common data types, but it seems to have faded into obscurity.
It's not very hard to use the data representation parts of XML-RPC, as you suggest - here are some examples in Python, Perl and Ruby.
However, I agree that JSON is more readable. A slim advantage of the XML-RPC approach for server-server transport is more widespread native support, perhaps?
XMLRPC does have a couple data types that JSON does not: a formal date type (which is nice), a binary type (which is just base64 encoded, but formalized), and boolean (which can be a little hard, since truthiness is vague in several languages, including Javascript, Python, and C). I don't even know how one actually represents binary data in Javascript. Of course XMLRPC is also missing nil, which is very annoying considering just about every language has a concept of nil.
Anyway, I agree that the envelope is XMLRPC's biggest flaw (it's a dumb verbose envelope to boot). Without that it'd be okay; not horribly unreadable, a reasonable data model (except for the nil thing). But no one ever proposed using XMLRPC that way, and it's not even "XMLRPC", it would be something else.
I guess XMLRPC also has failure codes. JSON just has 500 Server Error (or maybe 400 Bad Request); which maybe is just fine, I'm not sure.
Some choice quotes: and the bit you quoted:
Leaving aside JSON's technical advantages for now, this is a clear example of a political reason to avoid XML-RPC --- Dave Winer gets a little more credibility every time someone uses XML-RPC, and these quotes show that he's a loose cannon. He had similar comments about WDDX when Allaire created it.
The technical advantages are:
FYI. UserTalk does have a nil value. Always has.
Dave Winer - 21st December 2006 02:40 - #
Kragen,
They're li's, but all.css sets list-style-type:none.
Simon just redesigned and apparently still has polishing to do.
Jeremy Dunck - 21st December 2006 05:18 - #
One point that's worth bringing up is character sets. XML has, via the xml header, a way of explicitly declaring what character set you're using.
JSON doesn't.
The two languages approach the problem in totally different ways. XML finally allows you to solve all the "what encoding am I using?" problem by stating explicitly what encoding you're using right there in the data stream. JSON solves this problem by stating explicitly that you're only allowed to use on of the UTF encodings and that's all there is to it. Latin-1 (or it's weird neighbors) are just not an option.
Kragen: sorry about that; I'll fix the CSS later.
@Mark - that JSON requires a Unicode encoding may be a really good thing - less choice being better in this case.
Effectively that forces the JSON producer to do any required charset conversion, rather than than the consumer, as with XML. And the producer should be better placed to make the right decisions here.
Harry Fuecks - 21st December 2006 10:35 - #
You can't safely use non-ascii in JSON if you're talking to web browsers - there's a really irritating bug in some browsers where they assume that the character set of a script tag is the same as that of the page:
http://jerakeen.org/test/unicode/
Fortunately, you can encode any character with a \uXXXX syntax, so you can get round this. And it only applies to script tags - XMLHTTPRequest does the right thing.
I especially like the fact that there's a encrypter/decrypter for every language I use. I was amazed by Dave's harsh criticisms simply because it's not XML. To me, that's JSON's greatest asset... it's not XML!
i think people may have the conception that json and xml are isomorphic, but isn't it true that it is impossible to model, for example, recursive data structures in json? in the case of xml, one can infer this type of structure from the dtd, from json you are left making assumptions on a "as you see it" basis of parsing the structure.
whoopeee - 21st December 2006 18:33 - #
Simon, bravo on a well written explanation of the benefits of JSON. Thank you for writing this up.
Also, because I am only just now coming around to it, I love what you've done with your site, particularly with regards to the comment listing and form. Excellent job!
XML is fine for complex documents, but is terrible for data. It's so terrible that they had to invent a new word and write a specification to describe its data model ("Infoset") because XML's model doesn't map to data types common in programming languages.
In contrast, you nailed what's best about JSON:
I'd been working on a simple-as-possible data markup language for a while, but JSON came out and took over, and I'm glad it did. Crockford's brilliance here is really that he recognized that he didn't need invent anything new. He merely took something that was known to be useful, that already had a standard, and specified a subset of it, gave it a name, and wrote some code. And the fact that there are now dozens of implementations and everybody uses it proves that he was right.
XML is meant for documents. JSON is meant for data.
Keith - 21st December 2006 20:43 - #
Hmm, things like <blockquote> and <em> aren't being formatted in comments. :(
Keith - 21st December 2006 20:45 - #
I've added rudimentary CSS to the comments, so lists, blockquotes and ems work now. It's a quick fix for the moment.
Thanks for the correction, Dave. I must have misremembered something you wrote back in 1999 or so.
keith writes:
> because XML's model doesn't map to
> data types common in programming
> languages
neither does JSON's. JSON structures only map onto javascript data structures as an implementation detail, and it only maps to "types" in the most primitive sense of the most basic structures we would presume to find in simplistic programming languages.
when parsing a JSON structure, how do you know that you are reading in an instantiation of a composite type? you don't. you take what is read, call it "foo", use it and chuck it. thats fine for most web applications, but it is not a data description language, unless you presume that all of our typing needs conform only to primitive types also encompassed by the JSON delimiter glyphs.
XML doesn't get you all the way either, but it encompasses more than JSON can.
whoopeee - 21st December 2006 22:24 - #
keith writes:
"because XML's model doesn't map to
data types common in programming
languages"
neither does JSON's. JSON structures only map onto javascript data structures as an implementation detail, and it only maps to "types" in the most primitive sense of the most basic structures we would presume to find in simplistic programming languages.
when parsing a JSON structure, how do you know that you are reading in an instantiation of a composite type? you don't. you take what is read, call it "foo", use it and chuck it. thats fine for most web applications, but it is not a data description language, unless you presume that all of our typing needs conform only to primitive types also encompassed by the JSON delimiter glyphs.
XML doesn't get you all the way either, but it encompasses more than JSON can.
whoopeee - 21st December 2006 22:25 - #
woopeee writes:
This is correct, even if you did pull JSON into your application you would likely parse it into your custom classes anyway, the same you would do with XML.
JSON is definitely a must for AJAX, but truthfully a lot of other languages or frameworks have powerful XML parsing/class mapping available.
kris meister - 23rd December 2006 05:28 - #
Maybe there's something here about different styles of programming. In Java and similar languages it's rare that you use a "basic" data structure like a HashMap to represent and work with a piece of data: if you're dealing with a photo, you'll create a Photo class.
In languages such as Python, Perl and Ruby data structures like lists and dictionaries are built in to the core language, and thus it's often more convenient to use them than to roll a custom class unless there's a good reason to (to associate some behaviour with the data, for example).
If you're going to work with data as lists and dictionaries, pulling in JSON gives you the data in the format in which you are going to use it, and gives you a nice productivity boost. If you're going to turn data in to instances of classes before doing any work then you'll have to process the data regardless of whether it's JSON or XML, so the benefit is much smaller.
you clearly do not understand the idea of a 'semantic' web. Using xml makes it possible to use that information on other places, you can include it in an xml document, like xhtml, using xlink to point to certain data elements. And another advantages: Your data is tagged. You have given it semantic meaning.
And yet another improvement: You can extend your xml data file with completely different data without the problem of compatiblity.
Just yet another advantage:
xsl: style your data.
OK, if you really just using the data file to communicate with your own application json is a posiblity.
But yet, XML is much much much better.
Just learn xml and some xml languages and you'll see the point.
tjerk wolterink - 24th December 2006 09:22 - #
Andrew Hedges - 24th December 2006 18:18 - #
"Just learn xml and some xml languages and you'll see the point."
just learn about lambda calculus and you see there is no point to the xml fairy-tail
wert - 25th December 2006 00:00 - #
the arrogance of this Dave Winer is amaging.
wert - 25th December 2006 00:02 - #
@wert: could you enlighten me, what the lambda calculus has to do with xml? This is a serious question.
It's truly not just for JavaScript. For example, although Apple's .plist file format is currently XML text or binary data, it did at one point use a JSON-like format. Data files in application bundles can also use a format that, while perhaps not JSON, is also very similar.
Anonymous Coward - 26th December 2006 07:00 - #
The old style plist format is brilliant. We still use it to this day. It's way more readable and compact and works 99% of the time. In case we need something richer, we use the XML representation.
The old stype plist format has been around since the early/mid 90's and I believe it was part of the OpenStep spec.
JSON is very similar and it amazes me that people are just catching on to this kind of more readable and compact data representation.
AJ - 26th December 2006 18:41 - #
I love it when people pretend they understand what the word "semantic" in the phrase "semantic web" means.
joshua schachter - 27th December 2006 02:12 - #
XMl is always an overhead for simple data transfer though very useful for complex documents.
JSON can solve most normal problems ..
The next thing may be XML Schema having a JSON type.
with XSD like
<somedata type=JSON>
and XML like
<ComplexDocument>
somedata= {
"name": "Simon Willison",
}
</ComplexDocument>
Vinothkumar - 30th December 2006 19:53 - #
What about the following XML representation and standard XML libraries to encode and decode as needed:
<struct id="person">
<field id="name" value="Simon Willison" />
<field id="age" value="25" />
...
<field id="urls"> <value>http://simonwillison.net/</value>
<value>http://x.com</value>
...
</field>
</struct>
Khaled Al-Akhras - 28th March 2007 18:30 - #