Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Protocol Buffers: Google's Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML.

10 comments

  1. Like I asked on the original message (which oddly enough my comment didn't appear), what does faster than XML mean? A bit of context would have been relevant here.

    Sylvain - 8th July 2008 11:47 - #

  2. @Sylvian: Simple enough: it's quicker to encode and decode, and, given the size reduction, transmit, though that's beside the point. That, at least, is what I understand from what I've read so far.

    Keith Gaughan - 8th July 2008 16:42 - #

  3. Well, then, why not use JSON? It's also smaller and potentially faster than XML. Just add a schema for validation and you're golden. Their samples for the protocol buffer text format look similar, but not quite compatible with JSON, but also not different enough to warrant a completely inoperable syntax.

    David Golightly - 8th July 2008 22:38 - #

  4. @David: Because the use case for the two is different. Anyway, where did you get the idea that its binary wire format is anything like JSON? It's nothing like it. For a start, it's <em>binary</em> rather than textual. If you want to compare it to anything, think of the encoding side of it as a less painful version of ASN.1 BER.

    Keith Gaughan - 8th July 2008 23:34 - #

  5. @Keith - I've got no beef with the binary format being completely inoperable with eg. JSON. I'd expect nothing less. However, As JSON aspires to be a data-format lingua franca a la XML, it would be great if the standard textual syntax of this new format were interoperable with at least some existing text-based format. Instead I see text examples like:

    
      person {
        name = "John Doe"
        email = "jdoe@example.com"
      }
    

    (see Why Not XML?). Again, I totally understand and am excited about the binary format, which looks incredibly efficient. On the other hand, I'm always just a little disappointed when someone goes about creating their own new textual format syntax on arbitrary grounds, rather than adapting an existing format to their needs.

    David Golightly - 9th July 2008 01:50 - #

  6. @David: The text format is mainly due to backwards compatibility; Protocol Buffers have been at use in Google for 7 years now, so there's a lot of ascii protobufs kicking around. That said, there's no reason that a new application couldn't make a new format for language support or interoperability (XML, JSON, etc). The main idea is that the binary format is compact, and you can always translate to/from that format when size is an issue.

    JBruce - 9th July 2008 05:31 - #

  7. @Keith: Because the use case for the two is different.

    Huh?! That doesn't make more sense to me. You compare different format and container to serialize data but you consider that JSON is a different use case than XML? How?

    Besides I'm quite happy that a highly optimized binary format for Google's architecture is faster than generic formats like XML and JSON could be.

    Sylvain - 9th July 2008 06:48 - #

  8. @Sylvian: The use case for Protocol Buffers is to schlep data structures around as efficiently as possible. The use case for XML is document markup, but it's often abused as a method for encoding data structures. The use case for JSON is similar to that of YAML: provide a human-readable method of schlepping data structures around. Does that make more sense?

    Keith Gaughan - 9th July 2008 16:19 - #

  9. @Keith: Thanks that last explanation makes more sense :)

    Sylvain - 9th July 2008 19:11 - #

  10. I'm interested to see if people will pick this up more willingly than Facebook's (excellent) Thrift software which accomplishes precisely the same goals and has been open-source for some time.

    Richard Crowley - 10th July 2008 00:27 - #

Sign in with OpenID

Auto-HTML: Line breaks are preserved; URLs will be converted in to links.

Manual XHTML: Enter your own, valid XHTML. Allowed tags are a, p, blockquote, ul, ol, li, dl, dt, dd, em, strong, dfn, code, q, samp, kbd, var, cite, abbr, acronym, sub, sup, br, pre

A django site