Simon Willison’s Weblog

Subscribe

Items tagged google in Feb

Filters: Month: Feb × google × Sorted by date


The killer app of Gemini Pro 1.5 is video

Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models.

[... 2839 words]

Gemma: Introducing new state-of-the-art open models. Google get in on the openly licensed LLM game: Gemma comes in two sizes, 2B and 7B, trained on 2 trillion and 6 trillion tokens respectively. The terms of use “permit responsible commercial usage”. In the benchmarks it appears to compare favorably to Mistral and Llama 2.

Something that caught my eye in the terms: “Google may update Gemma from time to time, and you must make reasonable efforts to use the latest version of Gemma.”

One of the biggest benefits of running your own model is that it can protect you from model updates that break your carefully tested prompts, so I’m not thrilled by that particular clause.

UPDATE: It turns out that clause isn’t uncommon—the phrase “You shall undertake reasonable efforts to use the latest version of the Model” is present in both the Stable Diffusion and BigScience Open RAIL-M licenses. # 21st February 2024, 4:22 pm

Our next-generation model: Gemini 1.5 (via) The big news here is about context length: Gemini 1.5 (a Mixture-of-Experts model) will do 128,000 tokens in general release, available in limited preview with a 1 million token context and has shown promising research results with 10 million tokens!

1 million tokens is 700,000 words or around 7 novels—also described in the blog post as an hour of video or 11 hours of audio. # 15th February 2024, 4:17 pm

One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google’s huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.

Eric Lehman, internal Google email in 2018 # 11th February 2024, 10:59 pm

Google’s Gemini Advanced: Tasting Notes and Implications. Ethan Mollick reviews the new Google Gemini Advanced—a rebranded Bard, released today, that runs on the GPT-4 competitive Gemini Ultra model.

“GPT-4 [...] has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we only had one advanced AI model to look at, and it is hard drawing conclusions with a dataset of one. Now there are two, and we can learn a few things.”

I like Ethan’s use of the term “tasting notes” here. Reminds me of how Matt Webb talks about being a language model sommelier. # 8th February 2024, 3:10 pm

Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for. # 23rd February 2020, 2:13 pm

Discussion about Altavista on Hacker News. Fascinating thread on Hacker News where Bryant Durrell, a former Director from Altavista provides some insider thoughts on how they lost against Google. # 16th February 2019, 6:57 pm

Googlebot’s Javascript random() function is deterministic. random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?” # 7th February 2018, 2:41 am

Aside from Google I/O, does Google organize any other conferences?

They run a whole bunch, but many of them aren’t widely advertised—they have a lot of invite-only events for customers of their advertising tools, for example, and there are things like the Google Analytics Summit.

[... 95 words]

How can I sort a huge amount of numbers?

Sorting large amounts of data is one of the first exercises you’ll see described in any Hadoop or map/reduce tutorial—so I’d suggest taking a look at Hadoop.

[... 44 words]

If you missed out on joining to work at Google and Facebook, what should you do?

Remind yourself that there will always be more opportunities, and obsessing over what might have been is a huge waste of your time.

[... 45 words]

Why does Google use “Allow” in robots.txt, when the standard seems to be “Disallow?”

The Disallow command prevents search engines from crawling your site.

[... 59 words]

Is a relational database with many-to-many relationships difficult to develop into a web app?

Many to Many tables can be a bit of a pain to deal with using regular SQL, but a good ORM can abstract away any potential complexity almost entirely. I find using the Django ORM means I’m much less likely to shy away from a design that involves a many-to-many relationship because I know it won’t increase the complexity of the application. I imagine the Rails ORM has the same effect.

[... 91 words]

Google Image Charts: Mathematical (TeX) Formulas (via) I’m not sure when they added this, but you can now use the Google Charts Image API to render mathematical formulas, specified using TeX syntax. Wordpress.com and Wikipedia have both offered this feature for quite a while, but now you can use it anywhere on the Web. # 12th February 2010, 9:42 am

WARNING: Google Buzz Has A Huge Privacy Flaw. Interesting one this: by default, Buzz creates a public profile for you that lists the people you follow—but your default set of followers is derived from the people you contact most frequently using Gmail. This means users of Buzz may inadvertently reveal their most frequent contacts, which is an issue for people like journalists with anonymous sources, unhappy employees seeking new work or even people having e-mail based affairs. # 11th February 2010, 11:30 am

Map Maker for Developers. Tiles from Google’s Map Maker crowdsourcing effort are now available in the JS and static maps APIs on an opt-in basis. Maybe I’m misunderstanding something here, but Google Map Maker seems like a big step backwards for open geographic data. People donate their mapping efforts to Google, who keep them—unlike OpenStreetMap, where the donated efforts are made available under a Creative Commons license. # 21st February 2009, 9:05 am

Write to a Google Spreadsheet from a Python script. I didn’t know Google Spreadsheets could directly serve dynamic images that automatically update when the underlying data changes. # 16th February 2009, 9:02 pm

Google App Engine 1.1.9 boosts capacity and compatibility. Niall summarises the recent changes to App Engine. urllib and urllib2 support plus massively increased upload limits and request duration quotas will make it a whole lot easier to deploy serious projects on the platform. # 16th February 2009, 8:35 pm

Specify your canonical. You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines. # 14th February 2009, 11:28 am

Plaxo sees 92% success rate with OpenID/OAuth hybrid method. Really wish I could have been at the OpenID UX Summit hosted by Facebook yesterday—sounds like an awful lot of important problems are being solved. # 11th February 2009, 5:20 pm

Yahoo! Query Language thoughts. An engineer on Google’s App Engine provides an expert review of Yahoo!’s YQL. I found this more useful than the official documentation. # 9th February 2009, 10:29 pm

Google App Engine: A roadmap update! Receiving e-mail, background tasks and XMPP. I predict a bunch of sites will start building small parts of their overall functionality on App Engine when some of these features land (much easier than hosting your own custom XMPP server). # 9th February 2009, 7 pm

Recreating the button. Fascinating article from Doug Bowman on the work that went in to creating custom CSS buttons for use across Google’s different applications, avoiding images to improve performance ensure they could be easily styled using just CSS. I’d love to see the Google Code team turn this in to a full open source release—the more sites using these buttons the more familiar they will become to users at large. # 5th February 2009, 9:50 pm

Post-Commit Web Hooks for Google Code Project Hosting (via) I really, really like web hooks (which I’ve been calling “callback APIs”, but it looks like “web hooks” is the term that’s sticking). I’m interested in their scaling challenges—I’ve heard XMPP advocates argue that a web hook style model simply won’t scale for really large sites. # 4th February 2009, 10:22 am

Social Graph API. This is freaking awesome. Input one or more URLs to your profile pages and it returns a huge dump of crawled relationship data, based on XFN, FOAF and OpenID links. No API key required and it supports JSON callbacks so you can incorporate it in to a site without even needing to write any extra server-side code. # 3rd February 2008, 10:34 pm

The bright side: web spam is an evolutionary force that pushes relevance innovations such as trustrank forward. Spam created the market opportunity for Google, when Altavista succumbed in 97-98. Search startups should be praying to the spam gods for a second opportunity.

Rick Skrenta # 15th February 2007, 11:15 am

Add OpenSearch to your site in five minutes. OpenSearch is easy. DeWitt demonstrates how you don’t even need a site search engine to implement it if you take advantage of Google’s site: operator. # 9th February 2007, 12:52 am

This site may harm your computer. Tom Dyson’s personal weblog was flagged by Google as hosting malicious software, without any clue as to what the problem was. Sure looks like a false positive to me. # 5th February 2007, 9:26 am

Google cruft

New Google feature: Google Movies. Displays aggregated movie reviews (like Rotten Tomatoes), looks up local movie times based on your zip code saved in Google Local (more evidence of the fabled Google cookie), and even handles recommendations.

[... 120 words]

Google Maps and XSL

I’ll probably write more on this later, but it seems that Google Maps is using XSL. I spotted it loading the following pages while sniffing its activity with LiveHTTPHeaders:

[... 174 words]