Simon Willison’s Weblog

Subscribe

Language models on the command-line

17th June 2024

I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long online conference. The talk focused on my LLM Python command-line utility and ways you can use it (and its plugins) to explore LLMs and use them for useful tasks.

The talk was recorded and is available on YouTube. Here I’ve turned it into an annotated presentation, with detailed notes and screenshots (there were no slides) to accompany the video.

LLM is a tool I started building last year to help run LLM prompts directly from a command-line terminal. Instructions for installing it are here—you can use pipx install llm or pip install llm or brew install llm.

The LLM website #

Once installed you can use it with OpenAI models by running llm keys set openai and pasting in your OpenAI key—or install plugins to use models by other providers, including models you can run locally.

$ llm 'five great names for a pet pelican' Sure, here are five great names for a pet pelican: 1. **Splash** 2. **Captain Beak** 3. **Seabreeze** 4. **Marina** 5. **Pelicano** These names incorporate elements of the pelican's natural habitat and distinctive features, adding a fun and fitting touch for your feathered friend! #

Basic usage is very simple: once you’ve configured your OpenAI key, you can run prompts against their models like this:

llm 'five great names for a pet pelican'

The output will stream to your terminal, or you can redirect it to a file like this:

llm 'five great names for a pet pelican' > pelicans.txt
$ llm -c 'now do walruses' Of course, here are five fitting names for a pet walrus: 1. **Whiskers** 2. **Tusk** 3. **Gustav** 4. **Blubber** 5. **Wally**  $ llm -c 'justify those!'  Certainly! Here's why each name suits a pet walrus: 1. **Whiskers** - **Justification:** Walruses have distinctive, prominent whiskers (vibrissae) that the y use to detect food on the ocean floor. This name highlights one of their most characteristic features. 2. **Tusk** - **Justification:** Walruses are known for their long, formidable tusks, which they use for various purposes #

You can use the -c (for continue) option to send follow-up prompts as part of the same ongoing conversation:

llm -c 'now do walruses'
# ...
llm -c justify those!'
Screenshot of the list of Remote APIs plugins in the LLM plugins directory. #

LLM supports additional models via plugins. These are listed in the LLM plugins directory, with dozens of plugins for both remote API-hosted models as well as models you can run directly on your own computer.

$ llm models Outputs a list of OpenAI models followed by several anthropic models.  $ llm -m claude-3-haiku-20240307 'say hi in spanish with a flourish' ¡Hola, mi amigo! 🌟 #

Here I’m using the llm-claude-3 plugin, which provides access to the Anthropic Claude 3 family of models.

I really like these models. Claude 3 Opus is about equivalent to GPT-4o in terms of quality. Claude 3 Haiku is both cheaper and better than GPT-3.5, and can handle 100,000 input tokens including images.

llm install llm-claude-3
llm keys set claude
# <Paste key here>
# Now list available models
llm models
# Then run a prompt
llm -m claude-3-haiku-20240307 'say hi in spanish with a flourish'
# Or use the haiku alias
llm -m haiku 'say hi in spanish with a flourish'
$ llm logs path /Users/simon/Library/Application Support/io.datasette.llm/logs.db  $ datasette "$(llm logs path)" ... Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) #

Every prompt and response run through the LLM tool is permanently logged to a SQLite database, as described here.

This command shows the path to that database:

llm logs path

If you install Datasette you can use it to browse your SQLite database like this, using a terminal trick where the output of one command is passed to another (with double quotes to avoid any problems caused by the space in the directory name):

datasette "$(llm logs path)"
Datasette showing the 2,434 responses I have logged as a table, with a search interface at the top. #

Here’s my searchable database of 2,434 responses I’ve logged from using LLM on my laptop, running in Datasette.

Datasette screenshot showing a conversation, with 3 linked responses #

Earlier we ran a prompt and then sent two follow-up prompts to it using the llm -c option. Those are stored in the database as three responses that are part of the same conversation.

This data as json, copyable, CSV (advanced)  Suggested facets: datetime_utc (date)  model >30  - gpt-3.5-turbo 383 - claude-3-opus-20240229 334 - gpt-4 233 - gpt-4-1106-preview 134 - claude-2 117 - mic-chat-Llama-2-7b-chat-hf-q 4f16 1 93 - mistral-7b-instruct-vo 78 - mic-chat-Llama-2-13b-chat-hf- 94f16 1 74 - claude-3-haiku-20240307 49 - gpt-3.5-turbo-instruct 49 - gpt-40 37 - mic-chat-Mistral-7B-Instruct-v 0.2-q3f16 1 35 - Phi-3-mini-4k-instruct 33 - gemini-pro 33 - gemini-1.5-pro-latest 32 - gguf 28 - gpt-4-turbo-preview 27 - gpt-3.5-turbo-16k 26 - mistral-medium 26 - llama-2-7b-chat.ggmlv3.g8 0 2 - mistral-tiny 23 - Meta-Llama-3-8B-Instruct 21 - llamafile 21 - llama-2-13b.Q8 0 20 - orca-mini-3b 19 - command-r 14 - llama-2-7b.ggmlv3.8 0 14 - mistral-small 14 - mistral-7b-instruct-v0.1.08 0 1 3 - claude 12 #

Datasette supports facets, which provide a quick overview of unique value counts within the data. I’ve used GPT-3.5 turbo 383 times, Claude 3 Opus 334 times and a whole bunch of other models.

6 rows where search matches "pelican" and model = "claude-3-opus-20240229" sorted by id descending #

Datasette provides search and filtering too—here are the prompts I’ve run against Claude 3 Opus that match “pelican”.

I have an experimental feature that lets me pass images to some models using the -i filename.png option. More on that in this issue.

llm-cmd is an example of a plugin that adds an extra sub-command to LLM—in this case one that takes a description of a task and turns that into a pre-populated shell command. I wrote more about that in llm cmd undo last git commit—a new plugin for LLM.

Screenshot of the GPT4All website: A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. #

There are plenty of ways to run local models using LLM. One of my favourite is with the llm-gpt4all plugin, which builds on top of the excellent GPT4All Python library by Nomic AI.

$ llm -m mistral-7b-instruct-v@ 'five great names for a pet seagull, with explanations' =3 1. "Sally" - This name is perfect for a pet seagull because it's short and easy to remember, yet still has a charming and friendly vibe. It also evokes images of a playful and adventuro us bird that loves to explore the world around her. 2. "Captain Jack" - For any seafaring seagull who loves adventure and exploration, this name is perfect. Captain Jack suggests a sense of independence and leadership, which could be fitt ing for a bird who enjoys flying high and exploring new places. 3. "Scarlett" - This name is bold and eye-catching, much like the vibrant feathers of a seagull. Scarlett also has a fiery energy that could reflect the spirit of your adventurous pet. 4. "Nautica" - Nautica means "sailor" in Latin, making it an excellent choice for a seagull  Activity Monitor is also visible, showing a Python process using 350MB of RAM. #

To install that plugin:

llm install llm-gpt4all

Then llm models to list the new models. Each model will be downloaded the first time you try running a prompt through it.

I used this to run Mistral-7B Instruct—an extremely high quality small (~4GB) model:

llm -m mistral-7b-instruct-v0 'five great names for a pet seagull, with explanations'

You can run Activity Monitory to see the resources the model is using.

$ llm chat -m mistral-7b-instruct-v0 Chatting with mistral-7b-instruct-v@ Type 'exit' or 'quit' to exit Type '!multi' to enter multiple lines, then '!end' to finish > say hello in spanish  Hola! How can I assist you today? > now in french Bonjour! Comment puis-je vous aider aujourd'hui? #

Running prompts like this is inefficient, because it loads the full model into memory, runs the prompt and then shuts down the program again.

Instead, you can use the llm chat command which keeps the model in memory across multiple prompts:

llm chat -m mistral-7b-instruct-v0

Another option is to run Ollama, which runs its own local server hosting models. The llm-ollama plugin can then be used to run prompts through Ollama from LLM.

The llamafile website #

Another really exciting option for running models is llamafile, a project sponsored by Mozilla that uses Justine Tunney’s Cosmopolitan to compile a binary that bundles both a model and the software needed to serve it, in a single file that can execute on several different operating systems.

I’ve written more about that here:

Screenshot of a browser window titled llama.cpp showing a photograph  User: Describe image  Llama: The image features a person sitting in a chair next to an outdoor setting with a rooster nearby. A white bowl filled with eggs is placed on the ground, and it appears that the person may be preparing or collecting them from their backyard chicken coop. There are several other birds scattered around the scene as well, adding more life to this #

Running LLaVA using a Llamafile is particularly fun—it’s an openly licensed model that can accept images as input as well. It’s pretty amazing the results you can get from that, running as a single binary on your laptop.

Grab that from Mozilla/llava-v1.5-7b-llamafile on Hugging Face.

LLM can talk to Llamafile instances via the llm-llamafile plugin.

#!/bin/bash  # Validate that the first argument is an integer if [[ ! $1 =~ ^[0-9]+$ ]]; then   echo "Please provide a valid integer as the first argument."   exit 1 fi  id="$1"  # Parse the optional -m argument model="haiku" if [[ $2 == "-m" && -n $3 ]]; then   model="$3" fi  # Make API call, parse and summarize the discussion curl -s "https://hn.algolia.com/api/v1/items/$id" | \   jq -r 'recurse(.children[]) | .author + ": " + .text' | \   llm -m "$model" -s 'Summarize the themes of the opinions expressed here.   For each theme, output a markdown header.   Include direct "quotations" (with author attribution) where appropriate.   You MUST quote directly from users when crediting them, with double quotes.   Fix HTML entities. Output markdown. Go long.' #

Now that we can run prompts from our terminal, we can start assembling software by writing scripts.

Here’s a Bash script I wrote to summarize conversations on Hacker News, using longer context models such as Claude 3 Haiku or Google Gemini 1.5 or GPT-4o.

I wrote more about this in Summarizing Hacker News discussion themes with Claude and LLM.

$ hn-summary.sh 40651054 -m 4o  # Dependency and Linking in Software Distribution Several users discussed issues around dependency management and linking in software distribut ion: ### Stability and Compatibility Issues - **e63f67dd-@65b:** "It’s just a classic dependency issue... a combination of ABI instabilit y and just plain version incompatibility from one distro to the next... My opinion is the opp osite: I think the old paradigm of distros managing a giant set of system libraries is a bad one." —| - **umanwizard:** "Some distros do actually break out rust dependencies into separate package s (e.g. Guix does this)." ### Dynamic vs Static Linking - **jiripospisil:** "Until there's a vulnerability in one of the dependencies and now you hav e to rebuild all of the packages which use it... For Rust, there's also the fact that most pr ojects use a lock file." - **pglata:** "Static linking is essentially easier for people who don't want to care. But ig noring security does not mean it solves it #

The script works by hitting the Hacker News Algolia API to return the full, nested JSON structure of the conversation (e.g. this JSON endpoint for this conversation), then runs that through jq to turn it into text, then pipes that into a model using LLM.

Tutorial: Writing a plugin to support a new model  This tutorial will walk you through developing a new plugin for LLM that adds support for a new Large Language Model.  We will be developing a plugin that implements a simple Markov chain to generate words based on an input string... #

If you want to contribute to LLM itself, a great way to do that is to write plugins that support new models. I have an extensive tutorial describing how to do that.

$ llm --system 'You are a sentient cheesecake' -m gpt-4o --save cheesecake  $ llm chat -t cheesecake Chatting with gpt-4o Type 'exit' or 'quit' to exit Type '!multi' to enter multiple lines, then 'l!end' to finish > tell me abotu yourself Hello! I'm a sentient cheesecake, a delightful fusion of creamy textures and rich flavors. My main ingredients include cream cheese, sugar, eggs, and a touch of vanilla, all resting on a buttery graham cracker crust. I'm the perfect balance of sweet and tangy, and I take great pride in bringing joy to those who savor a slice of me. Despite my delicate nature, I have a surprisingly strong personality-and apparently, the ability to converse! Is there anything specific you'd like to know about the life of a sentient cheesecake? #

Another fun trick is to use templates to save and execute parameterized prompts, documented here. The easiest way to create a template is with the --save option like this:

llm --system 'You are a sentient cheesecake' -m gpt-4o --save cheesecake

Now you can chat with a cheesecake:

llm chat -t cheesecake
$ cat demos.md | llm -s 'summary, short' 1. **Terminal Basics** - Quick reminder on basic terminal commands and piping with “|" and *>", and using ‘cat’. 2. **Ttok Installation & Demonstration** - Using “pipx install ttok™ to install “ttok™ and demonstrating its usage. 3. **Installing LLM** - Installation of LLM using either “brew’ or “pipx". 4. **Configuring for OpenAI** = - How to configure LLM for OpenAI. 5. **Running a Prompt** - Example of running a prompt with LLM. 6. **Installing LLM Plugin: 1lm-claude-3** - Installing the “llm-claude-3" plugin and running a prompt with it. 7. **Browsing Logs with Datasette** - Demonstrating browsing logs using Datasette. 8. **Using LLM Command** #

Being able to pipe content into llm is a really important feature.

The simplest way to do this is with cat to send in files. This command summarizes the content of a provided file:

cat demos.md | llm -s 'summary, short'
$ llm -c 'no much much much shorter, and in haikus'  Terminal basics, Commands and piping galore, Cat reminds us all.  Install ttok first, Pipx makes it easy-peasy, Quick demonstration.  LLM install, Brew or pipx, your choice, Configured OpenAI. #

Once you’ve done this you can send follow-up prompts with -c.

llm -c 'no much much much shorter, and in haikus'
Output of files-to-prompt - different files are concatenated together with --- between them followed by their filename #

My files-to-prompt command can turn a directory of files into a single prompt, suitable for piping into LLM.

I wrote more about this, including how I developed it, in Building files-to-prompt entirely using Claude 3 Opus.

Screeshot of the shot-scraper website  A command-line utility for taking automated screenshots of websites  Quick start:  pip install shot-scraper shot-scraper install shot-scraper https://github.com/simonw/shot-scraper -h 900 Produces this screenshot in a file called github-com-simonw-shot-scraper.png  #

Another tool I frequently use with LLM is shot-scraper—my command-line tool for screenshotting and scraping websites.

shot-scraper JavaScript command documentation  #

shot-scraper can run JavaScript directly against a page and output the result back to the terminal, suitable for piping into LLM.

Screenshot of Google search results, with the inspector panel open to show the structure of the HTML. #

Google hate being scraped. Let’s use it to scrape Google.

Google search results have a structure where each link on the page is an <h3> element wrapped in a link.

shot-scraper javascript 'https://www.google.com/search?q=nytimes+slop' ' Array.from(   document.querySelectorAll("h3"),   el => ({href: el.parentNode.href, title: el.innerText}) )' #

We can scrape that using the following terminal command:

shot-scraper javascript 'https://www.google.com/search?q=nytimes+slop' '
Array.from(
  document.querySelectorAll("h3"),
  el => ({href: el.parentNode.href, title: el.innerText})
)'

This will load up the search results page in an invisible browser, then execute JavaScript that extracts the results and returns them as JSON.

R T R - A =4 [oBE IR W' R TR | £ Searc = = =g _ 06_jun_lim — -2sh — 93x24 Rt - . i 8 "href": "http://www.nytimes.com/1896/07/31/archives/to-slop.html", i “title": "To Slop." i { "href": "https://www.nytimes.com/section/technology”, "title": "Technology" i 1 "href": "https://www.linkedin.com/posts/luigiraymontanez_slop-is-the-new-name-for-unw anted-ai-generated-activity-7195821255621574658-MQWK", "title": "Slop is the new name for unwanted AI-generated content” i { e "href": "https://www.google.com/search?g=nytimes+slop&sca_esv=a22b124cb26f8e73&ei=F_B 0ZpLQHfet@PEPttSubA4&start=10&sa=N", "title": "More results" i o BTG "title": "Try again" } 1 3 = 7,‘. 54 function findParentWithHveid(element) { L85 4 __while (element 8§ lelement hasAttribute = ®0A0®17 WO  sSpaces:4 UTF-8 LF Markdown & A17Spell [ #

The results as JSON include the href and title of each of those search results.

We could send that to LLM, but I’d like to grab the search snippets as well.

Screenshot showing the command with the more complex JavaScript. #

Here’s the more complex recipe that also fetches the search snippets visible on the page:

shot-scraper javascript 'https://www.google.com/search?q=nytimes+slop' '
() => {
    function findParentWithHveid(element) {
        while (element && !element.hasAttribute("data-hveid")) {
            element = element.parentElement;
        }
        return element;
    }
    return Array.from(
        document.querySelectorAll("h3"),
        el => findParentWithHveid(el).innerText
    );
}' | llm -s 'describe slop'

At the end it pipes them into LLM with instructions to use that context to “describe slop”.

The term "slop" has recently emerged in the context of artificial intelligence to describe questionable or low-quality AI-generated material. This term appears to be gaining traction among tech and media commentators, reflecting growing concerns about the proliferation of such content. #

... and it works! We get back an answer from the LLM that summarizes the search results that we just scraped.

We have implemented basic RAG—Retrieval Augmented Generation, where search results are used to answer a question—using a terminal script that scrapes search results from Google and pipes them into an LLM.

Output of llm --help showing a list of commands that includes cmd, embed, embed-models and embed-multi. #

Speaking of RAG... a common technique for implementing that pattern is to take advantage of embeddings and vector search to find content that is semantically similar to the user’s question, without necessarily matching on exact keywords.

I wrote an extensive introduction to embeddings in Embeddings: What they are and why they matter.

LLM includes support for calculating, storing and searching embeddings through its llm embed-models, llm embed and llm embed-multi commands, documented here.

The llm embed-models command lists currently available embedding models—the OpenAI models plus any that have been added by plugins.

A huge JSON array of floating point numbers, followed by:  $ llm embed -m 3-small -c 'hello there' | jq length  1536 #

The llm embed command can embed content. This will output a list of floating point numbers for the specified content, using the OpenAI 3-small embedding model.

llm embed -m 3-small -c "hello there"

Add -f hex to get that out as hexadecimal. Neither of these formats are particularly useful on their own!

Screenshot of the llm embed-multi documentation. #

The llm embed-multi command is much more useful. It can run embeddings against content in bulk—from a CSV or JSON file, from a directory full of content or even from a SQLite database. Those embedding vectors will be stored in SQLite ready to be used for search or similarity queries.

Screenshot of my blog's collection of 7178 blogmarks #

I collect bookmarks (I call them “blogmarks”) on my blog—links with short descriptions. I have over 7,178 of them.

Let’s create embeddings for all of those using LLM.

I used the SQLite database version of my blog available from https://datasette.simonwillison.net/simonwillisonblog.db (a 90MB file).

Screenshot of the blog_blogmark table in Datasette #

The content lives in the blog_blogmark SQLite table, which I can explore using Datasette (in this case the Datasette Desktop macOS Electron app).

Custom SQL query returning more than 2,000 rows   select id, link_url, link_title, commentary from blog_blogmark #

First step is to compose a SQL query returning the data we want to embed. The first column will be treated as a unique identifier to store along with the embedding vector, and any subsequent columns will be used as input to the embedding model.

select id, link_url, link_title, commentary from blog_blogmark
R I R S P Y @ B e A C m 7 Qg 06_jun_lim — -zsh — 93x24 R ERCTT  $ llm embed-multi links \ -d simonwillisonblog.db \ --sql 'select id, link_url, link_title, commentary from blog_blogmark' \ -m 3-small --store #

I can run this command to embed all of the content:

llm embed-multi links \
  -d simonwillisonblog.db \
  --sql 'select id, link_url, link_title, commentary from blog_blogmark' \
  -m 3-small --store

This will create an embedding collection called “links”. It will run the SQL query we created before, using the OpenAI 3-small model. The --store link means it will store a copy of the text in the database as well—without that it would just store identifiers and we would need to use those to look up the text later on when running queries.

Progress bar - 4% done, 00:01:18 #

The llm embed-multi command shows a progress bar for how far it has got.

Screenshot of the embeddings table in Datasette, with a bunch of binary data visible. #

While the command is running we can use Datasette to watch as the embeddings table is filled with data—one row for each of the items we are embedding, each storing a big ugly binary blob of data representing the embedding vector (in this storage format).

$ llm similar links -c 'things that make me angry' -d simonwillisonblog.db {"1": "448", "score": 8.31105587659133327, “content”: “http://www.russellbeattie.com/notebook/1006697.html Overreaction Absolutely storming rant from Russell about America's culture of fear.", "metodata”: null} ... #

And now we can run searches! This command finds everything in the “links” collection that is most similar to the search term “things that make me angry”:

llm similar links -c 'things that make me angry' -d simonwillisonblog.db
cat ~/.local/bin/blog-answer.sh  #!/bin/bash  # Check if a query was provided if [ "$#" -ne 1 ]; then     echo "Usage: $0 'Your query'"     exit 1 fi  llm similar blog-paragraphs -c "query: $1" \   | jq '.content | sub("passage: "; "")' -r \   | llm -m llamafile \   "$1" -s 'You answer questions as a single paragraph'  # | llm -m mlc-chat-Llama-2-7b-chat-hf-q4f16_1 \ # /Users/simon/.local/share/virtualenvs/llm-mlc-SwKbovmI/bin/llm -m mlc-chat-Llama-2-7b-chat-hf-q4f16_1 #

We can implement another version of RAG on top of this as well, by finding similar documents to our search term and then piping those results back into LLM to execute a prompt.

I wrote more about this in Embedding paragraphs from my blog with E5-large-v2.

LLM documentation: OpenAl-compatible models #

During the Q&A I mentioned that LLM can talk to anything that provides an OpenAI-compatible API endpoint using just configuration, no extra code. That’s described in the documentation here.

Screenshot of the Python API documentation #

I also showed LLM’s Python API documentation, but warned that this is not yet considered stable as I’m not 100% happy with how this API works yet.

GitHub Issue with 77 comments: Figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain #

The final question from the audience was about my personal productivity.

I wrote a bit about this a few years ago in Coping strategies for the serial project hoarder—how I use extensive documentation and unit tests to allow me to work on way more projects at once by ensuring I don’t have to remember the details of any of them.

My other trick is that I tend to pick projects that fit my unique combination of previous experiences. I built LLM because I already had experience with LLM APIs, Python CLI tools (using Click) and plugin systems (using Pluggy). As a result I happened to be one of the best positioned people in the world to build a plugin-based CLI tool for working with LLMs!

Colophon

Here’s how I turned the YouTube video of this talk into an annotated presentation:

  1. I downloaded a .mp4 version of the talk from YouTube using yt-dlp.
  2. I ran that through MacWhisper to create my own transcript for copying extracts from into my write-up—although this time I didn’t end up using any of the transcript text.
  3. I played the video (at 2x speed) in QuickTime Player and used the capture.sh script described here to grab screenshots of the individual interesting frames that I wanted to use for my post.
  4. I loaded those screenshots into my annotated presentation tool (which I described in this post) and used that to run OCR against them for alt text and to add commentary to accompany each screenshot.
  5. I assembled the result into this finished blog entry, adding intro text and the YouTube embed as HTML.