Simon Willison’s Weblog

Subscribe

Entries tagged gemini, llms in 2024

Filters: Type: entry × Year: 2024 × gemini × llms × Sorted by date

Things we learned about LLMs in 2024

Visit Things we learned about LLMs in 2024

A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments.

[... 7,490 words]

December in LLMs has been a lot

I had big plans for December: for one thing, I was hoping to get to an actual RC of Datasette 1.0, in preparation for a full release in January. Instead, I’ve found myself distracted by a constant barrage of new LLM releases.

[... 901 words]

Gemini 2.0 Flash “Thinking mode”

Visit Gemini 2.0 Flash "Thinking mode"

Those new model releases just keep on flowing. Today it’s Google’s snappily named gemini-2.0-flash-thinking-exp, their first entrant into the o1-style inference scaling class of models. I posted about a great essay about the significance of these just this morning.

[... 569 words]

Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode

Visit Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode

Huge announcment from Google this morning: Introducing Gemini 2.0: our new AI model for the agentic era. There’s a ton of stuff in there (including updates on Project Astra and the new Project Mariner), but the most interesting pieces are the things we can start using today, built around the brand new Gemini 2.0 Flash model. The developer blog post has more of the technical details, and the Gemini 2.0 Cookbook is useful for understanding the API via Python code examples.

[... 1,740 words]

Prompts.js

Visit Prompts.js

I’ve been putting the new o1 model from OpenAI through its paces, in particular for code. I’m very impressed—it feels like it’s giving me a similar code quality to Claude 3.5 Sonnet, at least for Python and JavaScript and Bash... but it’s returning output noticeably faster.

[... 1,119 words]

First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Visit First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro.

[... 2,385 words]

Project: VERDAD—tracking misinformation in radio broadcasts using Gemini 1.5

Visit Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5

I’m starting a new interview series called Project. The idea is to interview people who are building interesting data projects and talk about what they’ve built, how they built it, and what they learned along the way.

[... 1,025 words]

Claude 3.5 Haiku

Visit Claude 3.5 Haiku

Anthropic released Claude 3.5 Haiku today, a few days later than expected (they said it would be out by the end of October).

[... 502 words]

You can now run prompts against images, audio and video in your terminal using LLM

Visit You can now run prompts against images, audio and video in your terminal using LLM

I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama, Claude and Gemini.

[... 1,399 words]

Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent

Visit Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent

The other day I found myself needing to add up some numeric values that were scattered across twelve different emails.

[... 1,294 words]

NotebookLM’s automatically generated podcasts are surprisingly effective

Visit NotebookLM's automatically generated podcasts are surprisingly effective

Audio Overview is a fun new feature of Google’s NotebookLM which is getting a lot of attention right now. It generates a one-off custom podcast against content you provide, where two AI hosts start up a “deep dive” discussion about the collected content. These last around ten minutes and are very podcast, with an astonishingly convincing audio back-and-forth conversation.

[... 1,489 words]

Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

Visit Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

I’ve been having a bunch of fun taking advantage of CORS-enabled LLM APIs to build client-side JavaScript applications that access LLMs directly. I also span up a new Datasette plugin for advanced permission management.

[... 2,050 words]

Building a tool showing how Gemini Pro can return bounding boxes for objects in images

Visit Building a tool showing how Gemini Pro can return bounding boxes for objects in images

I was browsing through Google’s Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision documentation:

[... 1,792 words]

Three major LLM releases in 24 hours (plus weeknotes)

Visit Three major LLM releases in 24 hours (plus weeknotes)

I’m a bit behind on my weeknotes, so there’s a lot to cover here. But first... a review of the last 24 hours of Large Language Model news. All times are in US Pacific on April 9th 2024.

[... 1,401 words]

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year.

[... 717 words]

The killer app of Gemini Pro 1.5 is video

Visit The killer app of Gemini Pro 1.5 is video

Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models.

[... 2,839 words]