Simon Willison’s Weblog

270 items tagged “google”

2012

How can I sort a huge amount of numbers?

Sorting large amounts of data is one of the first exercises you’ll see described in any Hadoop or map/reduce tutorial—so I’d suggest taking a look at Hadoop.

[... 44 words]

If you missed out on joining to work at Google and Facebook, what should you do?

Remind yourself that there will always be more opportunities, and obsessing over what might have been is a huge waste of your time.

[... 45 words]

Why does Google use “Allow” in robots.txt, when the standard seems to be “Disallow?”

The Disallow command prevents search engines from crawling your site.

[... 59 words]

Why is Google indexing & displaying www1 versions of my site and how might I stop this?

You should stop serving your site to the public on multiple subdomains. Configure your site to serve a 301 permanent redirect from www1-www4 to the equivalent page on www—also, make sure that your site accessed without the www redirects to the right place as well.

[... 269 words]

What platform was YouTube using before they were acquired by Google?

It was written in Python—I don’t think they used any particular framework (they started the site in 2005).

[... 37 words]

2011

Does Google (company) have their own Audio Visual department for their large conferences, or do they contract another company?

I believe it’s their own in-house team—when we ran the first DjangoCon at Google’s Mountain View HQ a few years ago I understood that the video team were their own (the same team that records their internal Google Tech Talks). It might be an external company that they contract in, but it felt like they were permanent staff.

[... 85 words]

We Need to Stop Google’s Exploitation of Open Communities. Mikel Maron from OpenStreetMap is justifiably angry about Google MapMaker, which copies OpenStreetMap’s model of crowdsourcing geographic data (even copying the OSM idea of Mapping Parties) but keeps the data under a much more restrictive license, and uses the Google brand to market itself to African governments. # 22nd April 2011, 10 am

Why Facebook open-sourced its datacenters. Jon Stokes speculates that Facebook plan to use open source hardware to compete with Google at datacenter efficiency . This isn’t a new pattern. Years ago when I worked at Yahoo! I was furiously jealous of the secret sauce technologies that allowed Google to build big applications faster than anyone else, such as BigTable and map/reduce. Today, the open source world has created better, free alternatives—sponsored in part by Facebook, Yahoo! and other Google competitors. # 9th April 2011, 7:54 am

Is a relational database with many-to-many relationships difficult to develop into a web app?

Many to Many tables can be a bit of a pain to deal with using regular SQL, but a good ORM can abstract away any potential complexity almost entirely. I find using the Django ORM means I’m much less likely to shy away from a design that involves a many-to-many relationship because I know it won’t increase the complexity of the application. I imagine the Rails ORM has the same effect.

[... 91 words]

Google APIs & Developer Products. Presented as a sort-of-periodic table. There’s quite a bit of stuff on here I didn’t know about. # 28th January 2011, 11:25 am

Getting Started—Google URL Shortener API. The API for the goo.gl URL shortener is really nice—no API key required, easy to create a short URL and you can retrieve detailed stats breakdowns (similar to bit.ly) as JSON for any URL. # 13th January 2011, 3:49 am

2010

Why did Google Wave fail to get significant user adoption?

When Wave first launched, individual Waves didn’t have a URL. This made it impossible to link to them from outside of Wave—people were having to say “log in to Wave, then search for X”. If you can’t link to something on the internet, it may as well not exist.

[... 67 words]

Google and Microsoft Cheat on Slow-Start. Should You? Fascinating optimisation tricks by some of the big websites, which violate the RFC governing the TCP slow-start algorithm in order to perform better in the common case. # 3rd December 2010, 7:03 pm

Is it not time for Google to redesign its search page by removing the “search” & “I’m Feeling Lucky” buttons since the buttons are now useless with the new “Instant” structure?

I don’t think so. The “Search” button defines their entire purpose. The “I’m Feeling Lucky” button is an important part of their brand.

[... 60 words]

Closure Compiler Service (via) A hosted version of the Google Closure Compiler (JavaScript minifier) running on App Engine. It has both a user interface and a REST API, which means you can use it as part of an automated build process without needing to set up a local copy of the software. # 9th August 2010, 1:17 pm

App Engine at Google I/O 2010. OpenID and OAuth are now baked in to the AppEngine users API. They’re also demoing two very exciting new features—a mapper API for doing map/reduce style queries against the data store, and a Channel API for building comet applications. # 20th May 2010, 3:30 pm

Google Font Directory: Font Preview. Handy tool for trying out the 18 open source fonts Google have released, along with server-side browser sniffing technology that serves up the correct version (including for IE6). The browser sniffing makes me a bit uncomfortable—will it play well with intermediate caches? What happens if I save a local copy of a page and then open it up in a different browser? # 20th May 2010, 3:20 pm

Stack Overflow Blog: OpenID, One Year Later. Google’s support is a huge deal—61% of Stack Overflow accounts use Google. Google’s implementation of directed identity has caused problems though, since Google provide a different OpenID for each domain making it hard for Stack Overflow, Server Fault and Super User to correlate accounts. Their solution is to require a (verified) e-mail address from Google OpenID users using sreg and use that as a key for the accounts. # 14th April 2010, 8:46 pm

Why Google MapMaker is not Open. Non-commercial use only, strict attribution requirements and you aren’t allowed to use the data for services that might compete with Google. This is why I’m disappointed every time I see Google encouraging people to contribute to Map Make, especially in the developing world—if those people contributed to OpenStreetMap instead they would be building something far more valuable for their community. # 16th March 2010, 10:41 am

RE2: a principled approach to regular expression matching. Google have open sourced RE2, the C++ regular expression library they developed for Google Code Search, Sawzall, Bigtable and other internal projects. Unlike PCRE it avoids the potential for exponential run time and unbounded stack usage and guarantees that searches complete in linear time, mainly by dropping support for back references. # 12th March 2010, 9:28 am

Google Image Charts: Mathematical (TeX) Formulas (via) I’m not sure when they added this, but you can now use the Google Charts Image API to render mathematical formulas, specified using TeX syntax. Wordpress.com and Wikipedia have both offered this feature for quite a while, but now you can use it anywhere on the Web. # 12th February 2010, 9:42 am

WARNING: Google Buzz Has A Huge Privacy Flaw. Interesting one this: by default, Buzz creates a public profile for you that lists the people you follow—but your default set of followers is derived from the people you contact most frequently using Gmail. This means users of Buzz may inadvertently reveal their most frequent contacts, which is an issue for people like journalists with anonymous sources, unhappy employees seeking new work or even people having e-mail based affairs. # 11th February 2010, 11:30 am

Fixing the Google Account problem. 3,000+ words explaining how to open a Google Doc invitation sent to an e-mail address that isn’t associated with your Google account. Worth reading just to get an idea for the enormous complexity involved in running a large scale identity system and designing an interface for managing aliases and multiple profiles. Google haven’t got it right yet—has anyone else? # 25th January 2010, 11:21 am

2009

HTTP + Politics = ? Mark Nottingham ponders the technical implications of Australia’s decision to apply a filter to all internet traffic. Australia is large enough (and far enough away from the northern hemisphere) that the speed of light is a performance issue, but filtering technologies play extremely poorly with optimisation technologies such as HTTP pipelining and Google’s SPDY proposal. # 15th December 2009, 3:36 pm

Recently Google Translate announced the ability to hear translations into English spoken via text-to-speech (TTS). Looking at the Firebug Net panel for where this TTS data was coming from, I saw that the speech audio is in MP3 format and is queried via a simple HTTP GET (REST) request: http://translate.google.com/translate_tts?q=text

Weston Ruter # 14th December 2009, 1:13 pm

A piece with a lot of screenshots about the close tab behaviour in Google Chrome. If you click “close” with your mouse, Chrome doesn’t resize the remaining tabs until you mouse away from the area. This means you can click “close” multiple times without having to chase the close button. I hadn’t noticed this, partly because Chrome doesn’t do it if you hit Command-W. They even switch the position of the close button in RTL languages such as Arabic. # 11th December 2009, 9:19 am

Any sufficiently advanced damage control is indistinguishable from ethics.

Eliezer # 6th December 2009, 9:31 am

EtherPad is Back Online Until Open Sourced. Fantastic news. EtherPad just got acquired by Google and announced the team would be joining the Google Wave effort and the existing service would be shut down. Lots of people complained, so they’re going to keep it alive until they’ve open sourced the code! # 6th December 2009, 9:08 am

Google Analytics goes async. This is excellent news—the latest version of the Google Analytics JavaScript is designed to allow for asynchronous loading, so it won’t hold up the rendering of your page. Analytics and banner ads are the two worst offenders when it comes to slowing down page loads. Now if only a banner ad vendor would follow suit... # 2nd December 2009, 6:30 pm