Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Keep your JSON valid

I’m a big fan of JSON, and it’s great to see it turning up as an output option for so many Web APIs. Unfortunately, many of these APIs are getting the details slightly wrong and in doing so are producing invalid JSON.

JSON isn’t just the object-literal syntax of JavaScript; it’s a very tightly defined subset of that syntax. The site has a spec (illustrated with pretty state machine diagrams) and there’s an RFC as well.

By far the most common error I’ve encountered relates to object keys. In JSON (unlike in JavaScript) these MUST be double-quoted strings. In fact, ALL strings in JSON must be enclosed in double quotes (JavaScript also allows single quotes; JSON does not).

Valid:

{ "name": "Simon" }

Invalid:

{ name: "Simon" }
{ 'name': "Simon" }
{ "name": 'Simon' }

It’s worth reviewing the other key differences between JSON and JavaScript. Remember, all valid JSON is valid JavaScript but the opposite is not true; JSON is a subset.

This stuff matters. Python’s excellent simplejson module is a strict parser; it refuses to consume invalid JSON. If you’re building a JSON API it’s worth taking the time to ensure you are valid—I suggest installing Python and simplejson and trying the following:

$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) 
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
>>> import urllib, simplejson
>>> url = "http://your-api.org/your-api/json"
>>> simplejson.load(urllib.urlopen(url))

Try it against a few JSON supporting sites. You might be surprised at the amount of invalid JSON out there.

As with outputting XML, the best way to avoid these problems is to use a pre-existing JSON generation library rather than rolling your own. I’ve had good experiences with simplejson for Python and php-json for PHP.

This is Keep your JSON valid by Simon Willison, posted on 11th October 2006.

Tagged ,

View blog reactions

Next: Graphing requests with Tamper Data

Previous: What I'm excited about, post-conference edition

17 comments

  1. This is a mega-simple JSON checker tool which I made for myself: http://www.colinramsay.co.uk/static/json-checker/ It was useful for me, I hope it's useful for someone else.

    Colin Ramsay - 11th October 2006 15:39 - #

  2. Hi Simon,

    I've recently been discovering this aswell - forced me to create a pre-parser, parser.

    I've notified everyone invalid i've come across. Finally I have a page to link them to. Cheers!

    Regards,

    Gareth

    Gareth Rodger - 11th October 2006 16:29 - #

  3. Just to be strict, I have to point out that the simplejson parser violates Postel's Law:

    be conservative in what you do, be liberal in what you accept from others

    It would be "nice" if everyone generated strict valid JSON (or HTML for that matter), but we know in practice that this not something we can count on, globally.

    (Yes, and this comment parser is PITA as well. CPU cycles are cheaper the human time.)

    Preston L. Bannister - 11th October 2006 17:52 - #

  4. Insofar as standards matter, you're right. So I figured it was worth pointing you towards RFC 2606. Please don't invent domain names for examples, as that practice often leads to trouble for those who eventually register (or already own) the domain you used in your example. Even if you think this particular instance of "your-api.org" won't cause any harm, let's set a good example for the children.

    Fellow - 11th October 2006 18:04 - #

  5. What do you use to produce your RSS feed?

    Sam Ruby - 11th October 2006 18:19 - #

  6. Thanks for this, Simon - the subtleties of the differences between JSON and full native object syntax had passed me by. The other thing that caught me out recently is the trailing comma in arrays. When generating JSON with some kind of templating system, it's often easiest to just print a comma on the end of each array item. This results in a trailing comma at the end of the array, obviously. Firefox has no problem with this, but in IE it causes an undefined item to be created at the end of your array, which is going to throw any loops off. Of course, once I'd found the problem I remembered reading about it before... that'll teach me to be lazy.

    Drew McLellan - 11th October 2006 20:09 - #

  7. Atlas for .net also has a valid JSON serialiser, for both server side and client side.

    Thom Shannon - 11th October 2006 22:17 - #

  8. You're on right Simon (i'm that guys that u have meet at Rome, Francesco). I use simplejson as returning value from XMLHttpRequests and using other' people works that returns non standard JSON format is very annoying. As you have stated, following the RFC or using standard libraries can be useful to simplify an already intricated situation about it. I've seen several good AJAX implementations based on non-standard JSON, and manage that to work is very time-consuming, despite its features that can be reused. Bye

    pr0gg3d - 12th October 2006 00:12 - #

  9. It would be "nice" if everyone generated strict valid JSON (or HTML for that matter), but we know in practice that this not something we can count on, globally.

    Agh! Not this again. I don't ever want to see Mark Pilgrim or anyone similarly inspired and heroic to build a Universal JSON Parser. I don't want to see 3000 unit tests either. The mess in syndication feeds and HTML is plenty enough.

    Generate and consume valid JSON. Break noisily otherwise. If that's troublesome, use XML. Heh. Oh god the irony.

    Otherwise, I'll start thinking it'd be "nice" if everyone "generated" strict valid C, C++, Python, Java, PHP, ... ad nauseum.

    l.m.orchard - 12th October 2006 01:35 - #

  10. Interesting

    Kuk - 12th October 2006 16:08 - #

  11. This results in a trailing comma at the end of the array, obviously. Firefox has no problem with this, but in IE it causes an undefined item to be created at the end of your array,

    I ran into this today. I wonder how many #IE sucks comments there are in the world?

    Jeremy Dunck - 13th October 2006 01:25 - #

  12. This is basically one of those things that any JavaScript + backender developer type of person would eventually figure out when they try passing data back and forth in JSON format. It takes about two seconds to realize "Aww man... I need quotes!" And voila. valid JSON. At least now everyone should know (because they've all read about it here :) )

    Dustin Diaz - 13th October 2006 06:50 - #

  13. Whether or not anybody notices it depends on whether or not the strictness of their particular parser. If it's lenient, intentionally or otherwise, then there's a chance they'll unleash invalid JSON on the world with their heart firmly in the right place.

    We can't rely on parsers to set the standard de facto (I'm sure em tags are semantically dubious for marking up Latin). I once had to deal with a 3rd party, bespoke XML application, created at great expense for the company I was working for at the time. It was only several months after it had been adopted and approved that we suddenly realised (through poking and prodding) that the underlying parser was SGML-based in the first instance, followed by some rather weak XML rule enforcement afterwards. I forget the details but it was possible to submit invalid XML to the system if you knew how to get your DOM knickers in just the right twist.

    Out of interest, why is JSON so draconian in its associative-array syntax? Is it primarily to make JSON parsers as lightweight as possible?

    J-P Stacey - 13th October 2006 11:08 - #

  14. Out of interest, why is JSON so draconian in its associative-array syntax? Is it primarily to make JSON parsers as lightweight as possible?

    Yes everything should be as simple as possible. Think about it. JSON is something computers produce and computers consume. Javascript is something that humans produce and computers consume. If computers produce and consume something, computers are perfect and so can be held to a strict standard. If humans produce something then they care about how things look and make mistakes and stuff, so they can held to looser standards.

    Frankie Robertson - 14th October 2006 20:06 - #

  15. Otherwise, I'll start thinking it'd be "nice" if everyone "generated" strict valid C, C++, Python, Java, PHP, ... ad nauseum.

    I too wish everyone would generate valid code. Understand, back in the 1980's I made a regular habit of running all my code through lint, gcc -Wall (which I'd had to port to a non-standard Unix), and the GreenHills C compiler at it's highest warning level (our actual production compiler) ... and would generally find nothing in my code. Code I got from someone else would go through the same process, until squeaky clean.

    I am without doubt a big fan of "strict". I would be shocked if any of my applications generated non-strict JSON.

    be conservative in what you do

    On the other hand, in reality I cannot expect others to always meet the same standard. Naturally I would prefer that others generate strict ... whatever. If I notice they are a bit off target, odds are good I'd let them know. But ... in a general purpose application, I will not build in that assumption where not justified.

    be liberal in what you accept from others

    On the flip side, the quoting around object keys bugs me somewhat. For simple identifiers as keys, the quotes add unneeded bulk. JSON is shipped between the browser and web server over a limited channel. Anything that bulks up the data without need ... is not a good idea.

    Preston L. Bannister - 8th November 2006 20:58 - #

  16. On the flip side, the quoting around object keys bugs me somewhat. For simple identifiers as keys, the quotes add unneeded bulk.

    True, but they remove a big chunk of mental bulk from every parser: from the complexity and legibility of its regexes, the brainspace required by its programmer, and the strain on the client's regex engine. If we're hoping for valid JSON, then this measure should hopefully simplify parsers and clarify what's expected from writers. I agree it's a pain in the arse, though, until you get used to it.

    Still, the flipside is no better. When recently migrating an email validator from what almost everybody's emails look like to the full RFC, including such whitespace crazy-bonkerers as '"foo bar"@example.com', I was going cross-eyed constructing the required regex. It's never quite the same as you remember it, and you always remember it in a different regex/quote-escaping convention from the one you're programming in...!

    J-P Stacey - 21st November 2006 16:31 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2006/10/11/json

A django site