Keep your JSON valid
I’m a big fan of JSON, and it’s great to see it turning up as an output option for so many Web APIs. Unfortunately, many of these APIs are getting the details slightly wrong and in doing so are producing invalid JSON.
JSON isn’t just the object-literal syntax of JavaScript; it’s a very tightly defined subset of that syntax. The site has a spec (illustrated with pretty state machine diagrams) and there’s an RFC as well.
By far the most common error I’ve encountered relates to object keys. In JSON (unlike in JavaScript) these MUST be double-quoted strings. In fact, ALL strings in JSON must be enclosed in double quotes (JavaScript also allows single quotes; JSON does not).
Valid:
{ "name": "Simon" }
Invalid:
{ name: "Simon" }
{ 'name': "Simon" }
{ "name": 'Simon' }
It’s worth reviewing the other key differences between JSON and JavaScript. Remember, all valid JSON is valid JavaScript but the opposite is not true; JSON is a subset.
This stuff matters. Python’s excellent simplejson module is a strict parser; it refuses to consume invalid JSON. If you’re building a JSON API it’s worth taking the time to ensure you are valid—I suggest installing Python and simplejson and trying the following:
$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
>>> import urllib, simplejson
>>> url = "http://your-api.org/your-api/json"
>>> simplejson.load(urllib.urlopen(url))
Try it against a few JSON supporting sites. You might be surprised at the amount of invalid JSON out there.
As with outputting XML, the best way to avoid these problems is to use a pre-existing JSON generation library rather than rolling your own. I’ve had good experiences with simplejson for Python and php-json for PHP.
Colin Ramsay - 11th October 2006 15:39 - #
Hi Simon,
I've recently been discovering this aswell - forced me to create a pre-parser, parser.
I've notified everyone invalid i've come across. Finally I have a page to link them to. Cheers!
Regards,
Gareth
Gareth Rodger - 11th October 2006 16:29 - #
Just to be strict, I have to point out that the simplejson parser violates Postel's Law:
It would be "nice" if everyone generated strict valid JSON (or HTML for that matter), but we know in practice that this not something we can count on, globally.
(Yes, and this comment parser is PITA as well. CPU cycles are cheaper the human time.)
Preston L. Bannister - 11th October 2006 17:52 - #
Fellow - 11th October 2006 18:04 - #
Sam Ruby - 11th October 2006 18:19 - #
Drew McLellan - 11th October 2006 20:09 - #
Thom Shannon - 11th October 2006 22:17 - #
pr0gg3d - 12th October 2006 00:12 - #
Agh! Not this again. I don't ever want to see Mark Pilgrim or anyone similarly inspired and heroic to build a Universal JSON Parser. I don't want to see 3000 unit tests either. The mess in syndication feeds and HTML is plenty enough.
Generate and consume valid JSON. Break noisily otherwise. If that's troublesome, use XML. Heh. Oh god the irony.
Otherwise, I'll start thinking it'd be "nice" if everyone "generated" strict valid C, C++, Python, Java, PHP, ... ad nauseum.
l.m.orchard - 12th October 2006 01:35 - #
Kuk - 12th October 2006 16:08 - #
I ran into this today. I wonder how many #IE sucks comments there are in the world?
Jeremy Dunck - 13th October 2006 01:25 - #
Dustin Diaz - 13th October 2006 06:50 - #
Whether or not anybody notices it depends on whether or not the strictness of their particular parser. If it's lenient, intentionally or otherwise, then there's a chance they'll unleash invalid JSON on the world with their heart firmly in the right place.
We can't rely on parsers to set the standard de facto (I'm sure em tags are semantically dubious for marking up Latin). I once had to deal with a 3rd party, bespoke XML application, created at great expense for the company I was working for at the time. It was only several months after it had been adopted and approved that we suddenly realised (through poking and prodding) that the underlying parser was SGML-based in the first instance, followed by some rather weak XML rule enforcement afterwards. I forget the details but it was possible to submit invalid XML to the system if you knew how to get your DOM knickers in just the right twist.
Out of interest, why is JSON so draconian in its associative-array syntax? Is it primarily to make JSON parsers as lightweight as possible?
J-P Stacey - 13th October 2006 11:08 - #
Yes everything should be as simple as possible. Think about it. JSON is something computers produce and computers consume. Javascript is something that humans produce and computers consume. If computers produce and consume something, computers are perfect and so can be held to a strict standard. If humans produce something then they care about how things look and make mistakes and stuff, so they can held to looser standards.
Frankie Robertson - 14th October 2006 20:06 - #
å¡?æ??ä¿?æ??æ?¶ - 17th October 2006 06:05 - #
I too wish everyone would generate valid code. Understand, back in the 1980's I made a regular habit of running all my code through
lint,gcc -Wall(which I'd had to port to a non-standard Unix), and the GreenHills C compiler at it's highest warning level (our actual production compiler) ... and would generally find nothing in my code. Code I got from someone else would go through the same process, until squeaky clean.I am without doubt a big fan of "strict". I would be shocked if any of my applications generated non-strict JSON.
On the other hand, in reality I cannot expect others to always meet the same standard. Naturally I would prefer that others generate strict ... whatever. If I notice they are a bit off target, odds are good I'd let them know. But ... in a general purpose application, I will not build in that assumption where not justified.
On the flip side, the quoting around object keys bugs me somewhat. For simple identifiers as keys, the quotes add unneeded bulk. JSON is shipped between the browser and web server over a limited channel. Anything that bulks up the data without need ... is not a good idea.
Preston L. Bannister - 8th November 2006 20:58 - #
True, but they remove a big chunk of mental bulk from every parser: from the complexity and legibility of its regexes, the brainspace required by its programmer, and the strain on the client's regex engine. If we're hoping for valid JSON, then this measure should hopefully simplify parsers and clarify what's expected from writers. I agree it's a pain in the arse, though, until you get used to it.
Still, the flipside is no better. When recently migrating an email validator from what almost everybody's emails look like to the full RFC, including such whitespace crazy-bonkerers as '"foo bar"@example.com', I was going cross-eyed constructing the required regex. It's never quite the same as you remember it, and you always remember it in a different regex/quote-escaping convention from the one you're programming in...!
J-P Stacey - 21st November 2006 16:31 - #