Google Base is interesting
I’m still trying to get my head around Google Base. Here’s a brain-dump of my thinking so far. First, some links.
- Google Base FAQ
- Google Base introduction on the Google Blog (includes testimonials)
- Tom’s first impressions
Base is a very interesting product for a whole bunch of reasons. The data model is surprisingly simple on the surface: all items have a title, description, (optional) external URL, a “type” and a set of labels (a.k.a. tags) and “attributes”. Attributes are something for tag enthusiasts to get excited by—they’re name/value pairs that are kind of like tags in that you can apply them to anything, but more structured and with a greater level of implied meaning.
Attributes instantly made me think of geotagging on Flickr, where tags are overloaded to store latitude and longitude values (example here). Having first class support for this kind of extensible data is a very powerful concept.
Another interesting problem that the Google Base data model could be used to tackle is Wikipedia’s WikiProjects. If you look at any US Navy ship entry on Wikipedia (example) you’ll see a table on the right hand side of standard attributes relating to that ship—things like Length, Displacement, Armament and so on. This data isn’t really structured—it’s just a wiki table, manually maintained by participants of the Ships WikiProject.
Obviously this data would be more valuable if it was structured in a way that allowed queries to be made against it. Base-like attributes provide a way of doing this.
There’s definitely a trend towards this kind of loose data model at the moment. JotSpot allows all pages within a wiki to have as many extra name/value attribute pairs as you like (even the wiki body itself is internally implemented as a special attribute), and Ning works along similar lines.
Base currently allows bulk importing of data using tab delimited files, RSS or Atom. There are no outward bound APIs which is a notable omission—I wouldn’t be at all surprised to see them added in the next few weeks.
More recent articles
- Understanding GPT tokenizers - 8th June 2023
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023