Simon Willison’s Weblog


Google Base is interesting

16th November 2005

I’m still trying to get my head around Google Base. Here’s a brain-dump of my thinking so far. First, some links.

Base is a very interesting product for a whole bunch of reasons. The data model is surprisingly simple on the surface: all items have a title, description, (optional) external URL, a “type” and a set of labels (a.k.a. tags) and “attributes”. Attributes are something for tag enthusiasts to get excited by—they’re name/value pairs that are kind of like tags in that you can apply them to anything, but more structured and with a greater level of implied meaning.

Attributes instantly made me think of geotagging on Flickr, where tags are overloaded to store latitude and longitude values (example here). Having first class support for this kind of extensible data is a very powerful concept.

Another interesting problem that the Google Base data model could be used to tackle is Wikipedia’s WikiProjects. If you look at any US Navy ship entry on Wikipedia (example) you’ll see a table on the right hand side of standard attributes relating to that ship—things like Length, Displacement, Armament and so on. This data isn’t really structured—it’s just a wiki table, manually maintained by participants of the Ships WikiProject.

Obviously this data would be more valuable if it was structured in a way that allowed queries to be made against it. Base-like attributes provide a way of doing this.

There’s definitely a trend towards this kind of loose data model at the moment. JotSpot allows all pages within a wiki to have as many extra name/value attribute pairs as you like (even the wiki body itself is internally implemented as a special attribute), and Ning works along similar lines.

Base currently allows bulk importing of data using tab delimited files, RSS or Atom. There are no outward bound APIs which is a notable omission—I wouldn’t be at all surprised to see them added in the next few weeks.

This is Google Base is interesting by Simon Willison, posted on 16th November 2005.

Next: Notes on public speaking

Previous: Social engineering and Orange

Previously hosted at