LLM 0.22, the annotated release notes
17th February 2025
I released LLM 0.22 this evening. Here are the annotated release notes:
- model.prompt(..., key=) for API keys
- chatgpt-4o-latest
- llm logs -s/--short
- llm models -q gemini -q exp
- llm embed-multi --prepend X
- Everything else
model.prompt(..., key=) for API keys
- Plugins that provide models that use API keys can now subclass the new
llm.KeyModel
andllm.AsyncKeyModel
classes. This results in the API key being passed as a newkey
parameter to their.execute()
methods, and means that Python users can pass a key as themodel.prompt(..., key=)
—see Passing an API key. Plugin developers should consult the new documentation on writing Models that accept API keys. #744
This is the big change. It’s only relevant to you if you use LLM as a Python library and you need the ability to pass API keys for OpenAI, Anthropic, Gemini etc in yourself in Python code rather than setting them as an environment variable.
It turns out I need to do that for Datasette Cloud, where API keys are retrieved from individual customer’s secret stores!
Thanks to this change, it’s now possible to do things like this—the key=
parameter to model.prompt()
is new:
import llm model = llm.get_model("gpt-4o-mini") response = model.prompt("Surprise me!", key="my-api-key") print(response.text())
Other plugins need to be updated to take advantage of this new feature. Here’s the documentation for plugin developers—I’ve released llm-anthropic 0.13 and llm-gemini 0.11 implementing the new pattern.
chatgpt-4o-latest
- New OpenAI model:
chatgpt-4o-latest
. This model ID accesses the current model being used to power ChatGPT, which can change without warning. #752
This model has actually been around since August 2024 but I had somehow missed it. chatgpt-4o-latest
is a model alias that provides access to the current model that is being used for GPT-4o running on ChatGPT, which is not the same as the GPT-4o models usually available via the API. It got an upgrade last week so it’s currently the alias that provides access to the most recently released OpenAI model.
Most OpenAI models such as gpt-4o
provide stable date-based aliases like gpt-4o-2024-08-06
which effectively let you “pin” to that exact model version. OpenAI technical staff have confirmed that they don’t change the model without updating that name.
The one exception is chatgpt-4o-latest
—that one can change without warning and doesn’t appear to have release notes at all.
It’s also a little more expensive that gpt-4o
—currently priced at $5/million tokens for input and $15/million for output, compared to GPT 4o’s $2.50/$10.
It’s a fun model to play with though! As of last week it appears to be very chatty and keen on using emoji. It also claims that it has a July 2024 training cut-off.
llm logs -s/--short
- New
llm logs -s/--short
flag, which returns a greatly shortened version of the matching log entries in YAML format with a truncated prompt and without including the response. #737
The llm logs
command lets you search through logged prompt-response pairs—I have 4,419 of them in my database, according to this command:
sqlite-utils tables "$(llm logs path)" --counts | grep responses
By default it outputs the full prompts and responses as Markdown—and since I’ve started leaning more into long context models (some recent examples) my logs have been getting pretty hard to navigate.
The new -s/--short
flag provides a much more concise YAML format. Here are some of my recent prompts that I’ve run using Google’s Gemini 2.0 Pro experimental model—the -u
flag includes usage statistics, and -n 4
limits the output to the most recent 4 entries:
llm logs --short -m gemini-2.0-pro-exp-02-05 -u -n 4
- model: gemini-2.0-pro-exp-02-05
datetime: '2025-02-13T22:30:48'
conversation: 01jm0q045fqp5xy5pn4j1bfbxs
prompt: '<documents> <document index="1"> <source>./index.md</source> <document_content>
# uv An extremely fast Python package...'
usage:
input: 281812
output: 1521
- model: gemini-2.0-pro-exp-02-05
datetime: '2025-02-13T22:32:29'
conversation: 01jm0q045fqp5xy5pn4j1bfbxs
prompt: I want to set it globally so if I run uv run python anywhere on my computer
I always get 3.13
usage:
input: 283369
output: 1540
- model: gemini-2.0-pro-exp-02-05
datetime: '2025-02-14T23:23:57'
conversation: 01jm3cek8eb4z8tkqhf4trk98b
prompt: '<documents> <document index="1"> <source>./LORA.md</source> <document_content>
# Fine-Tuning with LoRA or QLoRA You c...'
usage:
input: 162885
output: 2558
- model: gemini-2.0-pro-exp-02-05
datetime: '2025-02-14T23:30:13'
conversation: 01jm3csstrfygp35rk0y1w3rfc
prompt: '<documents> <document index="1"> <source>huggingface_hub/__init__.py</source>
<document_content> # Copyright 2020 The...'
usage:
input: 480216
output: 1791
llm models -q gemini -q exp
- Both
llm models
andllm embed-models
now take multiple-q
search fragments. You can now search for all models matching “gemini” and “exp” usingllm models -q gemini -q exp
. #748
I have over 100 models installed in LLM now across a bunch of different plugins. I added the -q
option to help search through them a few months ago, and now I’ve upgraded it so you can pass it multiple times.
Want to see all the Gemini experimental models?
llm models -q gemini -q exp
Outputs:
GeminiPro: gemini-exp-1114
GeminiPro: gemini-exp-1121
GeminiPro: gemini-exp-1206
GeminiPro: gemini-2.0-flash-exp
GeminiPro: learnlm-1.5-pro-experimental
GeminiPro: gemini-2.0-flash-thinking-exp-1219
GeminiPro: gemini-2.0-flash-thinking-exp-01-21
GeminiPro: gemini-2.0-pro-exp-02-05 (aliases: g2)
For consistency I added the same options to the llm embed-models
command, which lists available embedding models.
llm embed-multi --prepend X
- New
llm embed-multi --prepend X
option for prepending a string to each value before it is embedded—useful for models such as nomic-embed-text-v2-moe that require passages to start with a string like"search_document: "
. #745
This was inspired by my initial experiments with Nomic Embed Text V2 last week.
Everything else
- The
response.json()
andresponse.usage()
methods are now documented.
Someone asked a question about these methods online, which made me realize they weren’t documented. I enjoy promptly turning questions like this into documentation!
- Fixed a bug where conversations that were loaded from the database could not be continued using
asyncio
prompts. #742
This bug was reported by Romain Gehrig. It turned out not to be possible to execute a follow-up prompt in async mode if the previous conversation had been loaded from the database.
% llm 'hi' --async
Hello! How can I assist you today?
% llm 'now in french' --async -c
Error: 'async for' requires an object with __aiter__ method, got Response
I fixed the bug for the moment, but I’d like to make the whole mechanism of persisting and loading conversations from SQLite part of the documented and supported Python API—it’s currently tucked away in CLI-specific internals which aren’t safe for people to use in their own code.
- New plugin for macOS users: llm-mlx, which provides extremely high performance access to a wide range of local models using Apple’s MLX framework.
Technically not a part of the LLM 0.22 release, but I like using the release notes to help highlight significant new plugins and llm-mlx is fast coming my new favorite way to run models on my own machine.
- The
llm-claude-3
plugin has been renamed to llm-anthropic.
I wrote about this previously when I announced llm-anthropic. The new name prepares me for a world in which Anthropic release models that aren’t called Claude 3 or Claude 3.5!
More recent articles
- Run LLMs on macOS using llm-mlx and Apple's MLX framework - 15th February 2025
- URL-addressable Pyodide Python environments - 13th February 2025