llm-llama-server 0.2. Here's a second option for using LLM's new tool support against local models (the first was via llm-ollama).
It turns out the llama.cpp
ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server
plugin only needed a quick upgrade to get those working there.
Unfortunately it looks like streaming support doesn't work with tools in llama-server
at the moment, so I added a new model ID called llama-server-tools
which disables streaming and enables tools.
Here's how to try it out. First, ensure you have llama-server
- the easiest way to get that on macOS is via Homebrew:
brew install llama.cpp
Start the server running like this. This command will download and cache the 3.2GB unsloth/gemma-3-4b-it-GGUF:Q4_K_XL if you don't yet have it:
llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
Then in another window:
llm install llm-llama-server
llm -m llama-server-tools -T llm_time 'what time is it?' --td
And since you don't even need an API key for this, even if you've never used LLM before you can try it out with this uvx one-liner:
uvx --with llm-llama-server llm -m llama-server-tools -T llm_time 'what time is it?' --td
For more notes on using llama.cpp
with LLM see Trying out llama.cpp’s new vision support from a couple of weeks ago.
Recent articles
- Large Language Models can run tools in your terminal with LLM 0.26 - 27th May 2025
- Highlights from the Claude 4 system prompt - 25th May 2025
- Live blog: Claude 4 launch at Code with Claude - 22nd May 2025