LLM 0.22, the annotated release notes

17th February 2025

I released LLM 0.22 this evening. Here are the annotated release notes:

model.prompt(..., key=) for API keys
chatgpt-4o-latest
llm logs -s/--short
llm models -q gemini -q exp
llm embed-multi --prepend X
Everything else

model.prompt(..., key=) for API keys

Plugins that provide models that use API keys can now subclass the new llm.KeyModel and llm.AsyncKeyModel classes. This results in the API key being passed as a new key parameter to their .execute() methods, and means that Python users can pass a key as the model.prompt(..., key=)—see Passing an API key. Plugin developers should consult the new documentation on writing Models that accept API keys. #744

This is the big change. It’s only relevant to you if you use LLM as a Python library and you need the ability to pass API keys for OpenAI, Anthropic, Gemini etc in yourself in Python code rather than setting them as an environment variable.

It turns out I need to do that for Datasette Cloud, where API keys are retrieved from individual customer’s secret stores!

Thanks to this change, it’s now possible to do things like this—the key= parameter to model.prompt() is new:

import llm
model = llm.get_model("gpt-4o-mini")
response = model.prompt("Surprise me!", key="my-api-key")
print(response.text())

Other plugins need to be updated to take advantage of this new feature. Here’s the documentation for plugin developers—I’ve released llm-anthropic 0.13 and llm-gemini 0.11 implementing the new pattern.

chatgpt-4o-latest

New OpenAI model: chatgpt-4o-latest. This model ID accesses the current model being used to power ChatGPT, which can change without warning. #752

This model has actually been around since August 2024 but I had somehow missed it. chatgpt-4o-latest is a model alias that provides access to the current model that is being used for GPT-4o running on ChatGPT, which is not the same as the GPT-4o models usually available via the API. It got an upgrade last week so it’s currently the alias that provides access to the most recently released OpenAI model.

Most OpenAI models such as gpt-4o provide stable date-based aliases like gpt-4o-2024-08-06 which effectively let you “pin” to that exact model version. OpenAI technical staff have confirmed that they don’t change the model without updating that name.

The one exception is chatgpt-4o-latest—that one can change without warning and doesn’t appear to have release notes at all.

It’s also a little more expensive that gpt-4o—currently priced at $5/million tokens for input and $15/million for output, compared to GPT 4o’s $2.50/$10.

It’s a fun model to play with though! As of last week it appears to be very chatty and keen on using emoji. It also claims that it has a July 2024 training cut-off.

llm logs -s/--short

New llm logs -s/--short flag, which returns a greatly shortened version of the matching log entries in YAML format with a truncated prompt and without including the response. #737

The llm logs command lets you search through logged prompt-response pairs—I have 4,419 of them in my database, according to this command:

sqlite-utils tables "$(llm logs path)" --counts  | grep responses

By default it outputs the full prompts and responses as Markdown—and since I’ve started leaning more into long context models (some recent examples) my logs have been getting pretty hard to navigate.

The new -s/--short flag provides a much more concise YAML format. Here are some of my recent prompts that I’ve run using Google’s Gemini 2.0 Pro experimental model—the -u flag includes usage statistics, and -n 4 limits the output to the most recent 4 entries:

llm logs --short -m gemini-2.0-pro-exp-02-05 -u -n 4

- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:30:48'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: '<documents> <document index="1"> <source>./index.md</source> <document_content>
    # uv An extremely fast Python package...'
  usage:
    input: 281812
    output: 1521
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-13T22:32:29'
  conversation: 01jm0q045fqp5xy5pn4j1bfbxs
  prompt: I want to set it globally so if I run uv run python anywhere on my computer
    I always get 3.13
  usage:
    input: 283369
    output: 1540
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:23:57'
  conversation: 01jm3cek8eb4z8tkqhf4trk98b
  prompt: '<documents> <document index="1"> <source>./LORA.md</source> <document_content>
    # Fine-Tuning with LoRA or QLoRA You c...'
  usage:
    input: 162885
    output: 2558
- model: gemini-2.0-pro-exp-02-05
  datetime: '2025-02-14T23:30:13'
  conversation: 01jm3csstrfygp35rk0y1w3rfc
  prompt: '<documents> <document index="1"> <source>huggingface_hub/__init__.py</source>
    <document_content> # Copyright 2020 The...'
  usage:
    input: 480216
    output: 1791

llm models -q gemini -q exp

Both llm models and llm embed-models now take multiple -q search fragments. You can now search for all models matching “gemini” and “exp” using llm models -q gemini -q exp. #748

I have over 100 models installed in LLM now across a bunch of different plugins. I added the -q option to help search through them a few months ago, and now I’ve upgraded it so you can pass it multiple times.

Want to see all the Gemini experimental models?

llm models -q gemini -q exp

Outputs:

GeminiPro: gemini-exp-1114
GeminiPro: gemini-exp-1121
GeminiPro: gemini-exp-1206
GeminiPro: gemini-2.0-flash-exp
GeminiPro: learnlm-1.5-pro-experimental
GeminiPro: gemini-2.0-flash-thinking-exp-1219
GeminiPro: gemini-2.0-flash-thinking-exp-01-21
GeminiPro: gemini-2.0-pro-exp-02-05 (aliases: g2)

For consistency I added the same options to the llm embed-models command, which lists available embedding models.

llm embed-multi --prepend X

New llm embed-multi --prepend X option for prepending a string to each value before it is embedded—useful for models such as nomic-embed-text-v2-moe that require passages to start with a string like "search_document: ". #745

This was inspired by my initial experiments with Nomic Embed Text V2 last week.

Everything else

The response.json() and response.usage() methods are now documented.

Someone asked a question about these methods online, which made me realize they weren’t documented. I enjoy promptly turning questions like this into documentation!

Fixed a bug where conversations that were loaded from the database could not be continued using asyncio prompts. #742

This bug was reported by Romain Gehrig. It turned out not to be possible to execute a follow-up prompt in async mode if the previous conversation had been loaded from the database.

% llm 'hi' --async
Hello! How can I assist you today?
% llm 'now in french' --async -c
Error: 'async for' requires an object with __aiter__ method, got Response

I fixed the bug for the moment, but I’d like to make the whole mechanism of persisting and loading conversations from SQLite part of the documented and supported Python API—it’s currently tucked away in CLI-specific internals which aren’t safe for people to use in their own code.

New plugin for macOS users: llm-mlx, which provides extremely high performance access to a wide range of local models using Apple’s MLX framework.

Technically not a part of the LLM 0.22 release, but I like using the release notes to help highlight significant new plugins and llm-mlx is fast coming my new favorite way to run models on my own machine.

The llm-claude-3 plugin has been renamed to llm-anthropic.

I wrote about this previously when I announced llm-anthropic. The new name prepares me for a world in which Anthropic release models that aren’t called Claude 3 or Claude 3.5!

Posted 17th February 2025 at 6:19 am · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog