llm-llama-server 0.2. Here's a second option for using LLM's new tool support against local models (the first was via llm-ollama).
It turns out the llama.cpp
ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server
plugin only needed a quick upgrade to get those working there.
Unfortunately it looks like streaming support doesn't work with tools in llama-server
at the moment, so I added a new model ID called llama-server-tools
which disables streaming and enables tools.
Here's how to try it out. First, ensure you have llama-server
- the easiest way to get that on macOS is via Homebrew:
brew install llama.cpp
Start the server running like this. This command will download and cache the 3.2GB unsloth/gemma-3-4b-it-GGUF:Q4_K_XL if you don't yet have it:
llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
Then in another window:
llm install llm-llama-server
llm -m llama-server-tools -T llm_time 'what time is it?' --td
And since you don't even need an API key for this, even if you've never used LLM before you can try it out with this uvx one-liner:
uvx --with llm-llama-server llm -m llama-server-tools -T llm_time 'what time is it?' --td
For more notes on using llama.cpp
with LLM see Trying out llama.cpp’s new vision support from a couple of weeks ago.
Recent articles
- Design Patterns for Securing LLM Agents against Prompt Injections - 13th June 2025
- Comma v0.1 1T and 2T - 7B LLMs trained on openly licensed text - 7th June 2025
- The last six months in LLMs, illustrated by pelicans on bicycles - 6th June 2025