Simon Willison’s Weblog

Subscribe

December in LLMs has been a lot

20th December 2024

I had big plans for December: for one thing, I was hoping to get to an actual RC of Datasette 1.0, in preparation for a full release in January. Instead, I’ve found myself distracted by a constant barrage of new LLM releases.

On December 4th Amazon introduced the Amazon Nova family of multi-modal models—clearly priced to compete with the excellent and inexpensive Gemini 1.5 series from Google. I got those working with LLM via a new llm-bedrock plugin.

The next big release was Llama 3.3 70B-Instruct, on December 6th. Meta claimed that this 70B model was comparable in quality to their much larger 405B model, and those claims seem to hold weight.

I wrote about how I can now run a GPT-4 class model on my laptop—the same laptop that was running a GPT-3 class model just 20 months ago.

Llama 3.3 70B has started showing up from API providers now, including super-fast hosted versions from both Groq (276 tokens/second) and Cerebras (a quite frankly absurd 2,200 tokens/second). If you haven’t tried Val Town’s Cerebras Coder demo you really should.

I think the huge gains in model efficiency are one of the defining stories of LLMs in 2024. It’s not just the local models that have benefited: the price of proprietary hosted LLMs has dropped through the floor, a result of both competition between vendors and the increasing efficiency of the models themselves.

Last year the running joke was that every time Google put out a new Gemini release OpenAI would ship something more impressive that same day to undermine them.

The tides have turned! This month Google shipped four updates that took the wind out of OpenAI’s sails.

The first was gemini-exp-1206 on December 6th, an experimental model that jumped straight to the top of some of the leaderboards. Was this our first glimpse of Gemini 2.0?

That was followed by Gemini 2.0 Flash on December 11th, the first official release in Google’s Gemini 2.0 series. The streaming support was particularly impressive, with https://aistudio.google.com/live demonstrating streaming audio and webcam communication with the multi-modal LLM a full day before OpenAI released their own streaming camera/audio features in an update to ChatGPT.

Then this morning Google shipped Gemini 2.0 Flash “Thinking mode”, their version of the inference scaling technique pioneered by OpenAI’s o1. I did not expect Gemini to ship a version of that before 2024 had even ended.

OpenAI have one day left in their 12 Days of OpenAI event. Previous highlights have included the full o1 model (an upgrade from o1-preview) and o1-pro, Sora (later upstaged a week later by Google’s Veo 2), Canvas (with a confusing second way to run Python), Advanced Voice with video streaming and Santa and a very cool new WebRTC streaming API, ChatGPT Projects (pretty much a direct lift of the similar Claude feature) and the 1-800-CHATGPT phone line.

Tomorrow is the last day. I’m not going to try to predict what they’ll launch, but I imagine it will be something notable to close out the year.

Blog entries

Releases

TILs