Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

What is Google?

Via John Battelle, Rick Skrenta’s remarkable piece on what Google have actually built. They don’t just have the world’s best search engine, they have the world’s largest and most scalable platform for developing huge web-based applications.

Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It’s a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.

[ ... ]

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

Fascinating stuff.

This is What is Google? by Simon Willison, posted on 5th April 2004.

View blog reactions

Next: Glastonbury screw-up

Previous: Personalisation? We've already got it

8 comments

  1. Fascinating, and also well-timed to boost perceived value in advance of a public listing?

    Adam Bramwell - 5th April 2004 08:48 - #

  2. Would be interesting to know what kind of server side language GMail is built with. Considering that enviroment, would seem a "shared nothing" technology like PHP would be more suitable than platforms like servlets or ASP.NET which encouraging keep session and application data memory resident. Could be it's some in-house technology of course.

    Harry Fuecks - 5th April 2004 11:56 - #

  3. Harry, Not to be contentious... ASP.NET allows you to store session data in-process, at a "state server" which is maintained by an NT service, or in a database which allows you to scale web apps for web gardens and farms. We'll see what MSN will run on. Hard to say at this point. Can't wait to see it though.

    Milan Negovan - 5th April 2004 14:40 - #

  4. As I recall, Google use Python quite heavily, so it wouldn't surprise me to find out that at least some of it was implemented in that.

    Jim Dabell - 5th April 2004 15:10 - #

  5. Hi Milan: by no means trying to suggest ASP.NET and J2EE don't scale. What I'm trying to say is if Google have invested in building what seems to be a scaleable networked operating system, implementing scalability at an "application layer" doesn't seem to make sense, when the underlying "operating system" already has (probably a better) a means to do it.

    Harry Fuecks - 5th April 2004 16:00 - #

  6. This Slashdot post quotes a Cringly column that talks about Googles attitude towards faulty hardware.

    Google's GFS expects the failure of most components, including CPUs, memorys, disks, systems, etc--and in google's case nothing has to be replaced.

    Extremely cool way of thinking.

    Micah - 5th April 2004 17:32 - #

  7. A petabyte of fault tolerant data storage; fault tolerant in software accross thousands of servers.

    If the servers were on more than one physical site, just what would it take to cause data loss? And how do they write and test what must need to be incredibly low-bug software?

    SB - 6th April 2004 02:25 - #

  8. Googles very owned Page Rank Technology uses very complex alogarithim, which needs fast processing with the sensitive scripting.As Jim pointed out that google use most of Python, I am bit sure of that.And One point on google is valuable to note is that It is now using Distributed Computing power with all the OS recursively using varriant CPU Powers.

    saumendra - 12th April 2004 10:19 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2004/04/05/whatIsGoogle

A django site