Simon Willison’s Weblog

Subscribe

64 items tagged “llm”

LLM is my command-line tool for running prompts against Large Language Models.

2024

llm-gguf. I just released a new alpha plugin for LLM which adds support for running models from Meta's new Llama 3.1 family that have been packaged as GGUF files - it should work for other GGUF chat models too.

If you've already installed LLM the following set of commands should get you setup with Llama 3.1 8B:

llm install llm-gguf
llm gguf download-model \
  https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --alias llama-3.1-8b-instruct --alias l31i

This will download a 4.92GB GGUF from lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF on Hugging Face and save it (at least on macOS) to your ~/Library/Application Support/io.datasette.llm/gguf/models folder.

Once installed like that, you can run prompts through the model like so:

llm -m l31i "five great names for a pet lemur"

Or use the llm chat command to keep the model resident in memory and run an interactive chat session with it:

llm chat -m l31i

I decided to ship a new alpha plugin rather than update my existing llm-llama-cpp plugin because that older plugin has some design decisions baked in from the Llama 2 release which no longer make sense, and having a fresh plugin gave me a fresh slate to adopt the latest features from the excellent underlying llama-cpp-python library by Andrei Betlen.

# 23rd July 2024, 10:18 pm / meta, llm, generative-ai, llama, projects, ai, llms

Weeknotes: GPT-4o mini, LLM 0.15, sqlite-utils 3.37 and building a staging environment

Upgrades to LLM to support the latest models, and a whole bunch of invisible work building out a staging environment for Datasette Cloud.

[... 730 words]

LLM 0.15. A new release of my LLM CLI tool for interacting with Large Language Models from the terminal (see this recent talk for plenty of demos).

This release adds support for the brand new GPT-4o mini:

llm -m gpt-4o-mini "rave about pelicans in Spanish"

It also sets that model as the default used by the tool if no other model is specified. This replaces GPT-3.5 Turbo, the default since the first release of LLM. 4o-mini is both cheaper and way more capable than 3.5 Turbo.

# 18th July 2024, 7:44 pm / llm, projects, generative-ai, openai, ai, llms

Mistral NeMo. Released by Mistral today: "Our new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license."

Nice to see Mistral use Apache 2.0 for this, unlike their Codestral 22B release - though Codestral Mamba was Apache 2.0 as well.

Mistral's own benchmarks but NeMo slightly ahead of the smaller (but same general weight class) Gemma 2 9B and Llama 3 8B models.

It's both multi-lingual and trained for tool usage:

The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Part of this is down to the new Tekken tokenizer, which is 30% more efficient at representing both source code and most of the above listed languages.

You can try it out via Mistral's API using llm-mistral like this:

pipx install llm
llm install llm-mistral
llm keys set mistral
# paste La Plateforme API key here
llm mistral refresh # if you installed the plugin before
llm -m mistral/open-mistral-nemo 'Rave about pelicans in French'

# 18th July 2024, 4:40 pm / mistral, generative-ai, llm-tool-use, ai, llms, llm

Announcing our DjangoCon US 2024 Talks! I'm speaking at DjangoCon in Durham, NC in September.

My accepted talk title was How to design and implement extensible software with plugins. Here's my abstract:

Plugins offer a powerful way to extend software packages. Tools that support a plugin architecture include WordPress, Jupyter, VS Code and pytest - each of which benefits from an enormous array of plugins adding all kinds of new features and expanded capabilities.

Adding plugin support to an open source project can greatly reduce the friction involved in attracting new contributors. Users can work independently and even package and publish their work without needing to directly coordinate with the project's core maintainers. As a maintainer this means you can wake up one morning and your software grew new features without you even having to review a pull request!

There's one catch: information on how to design and implement plugin support for a project is scarce.

I now have three major open source projects that support plugins, with over 200 plugins published across those projects. I'll talk about everything I've learned along the way: when and how to use plugins, how to design plugin hooks and how to ensure your plugin authors have as good an experience as possible.

I'm going to be talking about what I've learned integrating Pluggy with Datasette, LLM and sqlite-utils. I've been looking for an excuse to turn this knowledge into a talk for ages, very excited to get to do it at DjangoCon!

# 17th July 2024, 3:20 am / djangocon, python, plugins, django, speaking, sqlite-utils, llm, datasette

Codestral Mamba. New 7B parameter LLM from Mistral, released today. Codestral Mamba is "a Mamba2 language model specialised in code generation, available under an Apache 2.0 license".

This the first model from Mistral that uses the Mamba architecture, as opposed to the much more common Transformers architecture. Mistral say that Mamba can offer faster responses irrespective of input length which makes it ideal for code auto-completion, hence why they chose to specialise the model in code.

It's available to run locally with the mistral-inference GPU library, and Mistral say "For local inference, keep an eye out for support in llama.cpp" (relevant issue).

It's also available through Mistral's La Plateforme API. I just shipped llm-mistral 0.4 adding a llm -m codestral-mamba "prompt goes here" default alias for the new model.

Also released today: MathΣtral, a 7B Apache 2 licensed model "designed for math reasoning and scientific discovery", with a 32,000 context window. This one isn't available through their API yet, but the weights are available on Hugging Face.

# 16th July 2024, 4:29 pm / open-source, mistral, llm, generative-ai, ai, llms

llm-claude-3 0.4. LLM plugin release adding support for the new Claude 3.5 Sonnet model:

pipx install llm
llm install -U llm-claude-3
llm keys set claude
# paste AP| key here
llm -m claude-3.5-sonnet \
  'a joke about a pelican and a walrus having lunch'

# 20th June 2024, 11:04 pm / llm, anthropic, claude, generative-ai, projects, ai, llms

Language models on the command-line

Visit Language models on the command-line

I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long online conference. The talk focused on my LLM Python command-line utility and ways you can use it (and its plugins) to explore LLMs and use them for useful tasks.

[... 4,992 words]

Accidental prompt injection against RAG applications

Visit Accidental prompt injection against RAG applications

@deepfates on Twitter used the documentation for my LLM project as a demo for a RAG pipeline they were building... and this happened:

[... 567 words]

Weeknotes: PyCon US 2024

Earlier this month I attended PyCon US 2024 in Pittsburgh, Pennsylvania. I gave an invited keynote on the Saturday morning titled “Imitation intelligence”, tying together much of what I’ve learned about Large Language Models over the past couple of years and making the case that the Python community has a unique opportunity and responsibility to help try to nudge this technology in a positive direction.

[... 474 words]

llm-gemini 0.1a4. A new release of my llm-gemini plugin adding support for the Gemini 1.5 Flash model that was revealed this morning at Google I/O.

I'm excited about this new model because of its low price. Flash is $0.35 per 1 million tokens for prompts up to 128K token and $0.70 per 1 million tokens for longer prompts - up to a million tokens now and potentially two million at some point in the future. That's 1/10th of the price of Gemini Pro 1.5, cheaper than GPT 3.5 ($0.50/million) and only a little more expensive than Claude 3 Haiku ($0.25/million).

# 14th May 2024, 8:32 pm / gemini, llm, generative-ai, projects, ai, google-io, llms

LLM 0.14, with support for GPT-4o. It's been a while since the last LLM release. This one adds support for OpenAI's new model:

llm -m gpt-4o "fascinate me"

Also a new llm logs -r (or --response) option for getting back just the response from your last prompt, without wrapping it in Markdown that includes the prompt.

Plus nine new plugins since 0.13!

# 13th May 2024, 9 pm / llm, projects, generative-ai, openai, ai, llms

microsoft/Phi-3-mini-4k-instruct-gguf (via) Microsoft’s Phi-3 LLM is out and it’s really impressive. This 4,000 token context GGUF model is just a 2.2GB (for the Q4 version) and ran on my Mac using the llamafile option described in the README. I could then run prompts through it using the llm-llamafile plugin.

The vibes are good! Initial test prompts I’ve tried feel similar to much larger 7B models, despite using just a few GBs of RAM. Tokens are returned fast too—it feels like the fastest model I’ve tried yet.

And it’s MIT licensed.

# 23rd April 2024, 5:40 pm / llms, llm, generative-ai, ai, homebrew-llms, microsoft, phi

Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets

Visit Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets

Llama 3 landed on Thursday. I ended up updating a whole bunch of different plugins to work with it, described in Options for accessing Llama 3 from the terminal using LLM.

[... 1,030 words]

Options for accessing Llama 3 from the terminal using LLM

Visit Options for accessing Llama 3 from the terminal using LLM

Llama 3 was released on Thursday. Early indications are that it’s now the best available openly licensed model—Llama 3 70b Instruct has taken joint 5th place on the LMSYS arena leaderboard, behind only Claude 3 Opus and some GPT-4s and sharing 5th place with Gemini Pro and Claude 3 Sonnet. But unlike those other models Llama 3 70b is weights available and can even be run on a (high end) laptop!

[... 1,962 words]

llm-gpt4all. New release of my LLM plugin which builds on Nomic's excellent gpt4all Python library. I've upgraded to their latest version which adds support for Llama 3 8B Instruct, so after a 4.4GB model download this works:

llm -m Meta-Llama-3-8B-Instruct "say hi in Spanish"

# 20th April 2024, 5:58 pm / nomic, llm, plugins, projects, generative-ai, ai, llms, llama, homebrew-llms

A POI Database in One Line (via) Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world.

Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to provide a bounding box to do so.

Drew Breunig came up with this delightful recipe for fetching data using LLM and gpt-3.5-turbo to fill in those bounding boxes:

overturemaps download --bbox=$(llm 'Give me a bounding box for Alameda, California expressed as only four numbers delineated by commas, with no spaces, longitude preceding latitude.') -f geojsonseq --type=place | geojson-to-sqlite alameda.db places - --nl --pk=id

# 19th April 2024, 2:44 am / drew-breunig, llm, gis, geojson

llm-reka. My new plugin for running LLM prompts against the Reka family of API hosted LLM models: reka-core ($10 per million input), reka-flash (80c per million) and reka-edge (40c per million).

All three of those models are trained from scratch by a team that includes several Google Brain alumni.

Reka Core is their most powerful model, released on Monday 15th April and claiming benchmark scores competitive with GPT-4 and Claude 3 Opus.

# 18th April 2024, 3:17 am / llm, generative-ai, projects, ai, llms

AI for Data Journalism: demonstrating what we can do with this stuff right now

Visit AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

[... 6,081 words]

Use an llm to automagically generate meaningful git commit messages. Neat, thoroughly documented recipe by Harper Reed using my LLM CLI tool as part of a scheme for if you’re feeling too lazy to write a commit message—it uses a prepare-commit-msg Git hook which runs any time you commit without a message and pipes your changes to a model along with a custom system prompt.

# 11th April 2024, 4:06 am / llm, ai, git, generative-ai

Building files-to-prompt entirely using Claude 3 Opus

Visit Building files-to-prompt entirely using Claude 3 Opus

files-to-prompt is a new tool I built to help me pipe several files at once into prompts to LLMs such as Claude and GPT-4.

[... 3,235 words]

llm-command-r. Cohere released Command R Plus today—an open weights (non commercial/research only) 104 billion parameter LLM, a big step up from their previous 35 billion Command R model.

Both models are fine-tuned for both tool use and RAG. The commercial API has features to expose this functionality, including a web-search connector which lets the model run web searches as part of answering the prompt and return documents and citations as part of the JSON response.

I released a new plugin for my LLM command line tool this morning adding support for the Command R models.

In addition to the two models it also adds a custom command for running prompts with web search enabled and listing the referenced documents.

# 4th April 2024, 5:38 pm / llm, plugins, projects, generative-ai, ai, llms, cohere, command-r, rag, llm-tool-use

llm-nomic-api-embed. My new plugin for LLM which adds API access to the Nomic series of embedding models. Nomic models can be run locally too, which makes them a great long-term commitment as there’s no risk of the models being retired in a way that damages the value of your previously calculated embedding vectors.

# 31st March 2024, 3:17 pm / llm, plugins, projects, nomic, ai, embeddings

llm-gemini 0.1a1. I upgraded my llm-gemini plugin to add support for the new Google Gemini Pro 1.5 model, which is beginning to roll out in early access.

The 1.5 model supports 1,048,576 input tokens and generates up to 8,192 output tokens—a big step up from Gemini 1.0 Pro which handled 30,720 and 2,048 respectively.

The big missing feature from my LLM tool at the moment is image input—a fantastic way to take advantage of that huge context window. I have a branch for this which I really need to get into a useful state.

# 28th March 2024, 3:32 am / gemini, llm, google, generative-ai, projects, ai, llms

llm cmd undo last git commit—a new plugin for LLM

Visit llm cmd undo last git commit - a new plugin for LLM

I just released a neat new plugin for my LLM command-line tool: llm-cmd. It lets you run a command to to generate a further terminal command, review and edit that command, then hit <enter> to execute it or <ctrl-c> to cancel.

[... 923 words]

Semgrep: AutoFixes using LLMs (via) semgrep is a really neat tool for semantic grep against source code—you can give it a pattern like “log.$A(...)” to match all forms of log.warning(...) / log.error(...) etc.

Ilia Choly built semgrepx— xargs for semgrep—and here shows how it can be used along with my llm CLI tool to execute code replacements against matches by passing them through an LLM such as Claude 3 Opus.

# 26th March 2024, 12:51 am / llm, claude, generative-ai, ai, llms

llm-claude-3 0.3. Anthropic released Claude 3 Haiku today, their least expensive model: $0.25/million tokens of input, $1.25/million of output (GPT-3.5 Turbo is $0.50/$1.50). Unlike GPT-3.5 Haiku also supports image inputs.

I just released a minor update to my llm-claude-3 LLM plugin adding support for the new model.

# 13th March 2024, 9:18 pm / llm, anthropic, claude, generative-ai, projects, ai, llms

llm-claude-3. I built a new plugin for LLM—my command-line tool and Python library for interacting with Large Language Models—which adds support for the new Claude 3 models from Anthropic.

# 4th March 2024, 6:46 pm / llm, anthropic, claude, ai, llms, python, generative-ai, projects

llmc.sh (via) Adam Montgomery wrote this a neat wrapper around my LLM CLI utility: it adds a “llmc” zsh function which you can ask for shell commands (llmc ’use ripgrep to find files matching otter’) which outputs the command, an explanation of the command and then copies the command to your clipboard for you to paste and execute if it looks like the right thing.

# 16th February 2024, 6:19 pm / llm, llms, ai, generative-ai, zsh

Weeknotes: a Datasette release, an LLM release and a bunch of new plugins

I wrote extensive annotated release notes for Datasette 1.0a8 and LLM 0.13 already. Here’s what else I’ve been up to this past three weeks.

[... 1,074 words]