Build an image search engine with llm-clip, chat with models with llm chat
12th September 2023
LLM is my combination CLI tool and Python library for working with Large Language Models. I just released LLM 0.10 with two significant new features: embedding support for binary files and the llm chat
command.
Image search by embedding images with CLIP
I wrote about LLM’s support for embeddings (including what those are and why they’re interesting) when I released 0.9 last week.
That initial release could only handle embeddings of text—great for things like building semantic search and finding related content, but not capable of handling other types of data.
It turns out there are some really interesting embedding models for working with binary data. Top of the list for me is CLIP, released by OpenAI in January 2021.
CLIP has a really impressive trick up its sleeve: it can embed both text and images into the same vector space.
This means you can create an index for a collection of photos, each placed somewhere in 512-dimensional space. Then you can take a text string—like “happy dog”—and embed that into the same space. The images that are closest to that location will be the ones that contain happy dogs!
My llm-clip plugin provides the CLIP model, loaded via SentenceTransformers. You can install and run it like this:
llm install llm-clip
llm embed-multi photos --files photos/ '*.jpg' --binary -m clip
This will install the llm-clip
plugin, then use embed-multi to embed all of the JPEG files in the photos/
directory using the clip
model.
The resulting embedding vectors are stored in an embedding collection called photos
. This defaults to going in the embeddings.db
SQLite database managed by LLM, or you can add -d photos.db
to store it in a separate database instead.
Then you can run text similarity searches against that collection using llm similar:
llm similar photos -c 'raccoon'
I get back:
{"id": "IMG_4801.jpeg", "score": 0.28125139257127457, "content": null, "metadata": null}
{"id": "IMG_4656.jpeg", "score": 0.26626441704164294, "content": null, "metadata": null}
{"id": "IMG_2944.jpeg", "score": 0.2647445926996852, "content": null, "metadata": null}
And sure enough, IMG_4801.jpeg
is this:
(I was pretty excited to snap a photo of a trash panda in an actual trash can.)
CLIP is a pretty old model at this point, and there are plenty of interesting alternatives that are just waiting for someone to wrap them in a plugin. I’m particularly excited about Facebook’s ImageBind, which can embed images, text, audio, depth, thermal, and IMU data all in the same vector space!
Chat with models using llm chat
The other big feature in LLM 0.10 is the new llm chat
command.
Prior to this release, the way to have an ongoing conversation with a model was through the -c/--continue
flag. You could start a conversation like this:
llm -m gpt-4 'Help me understand generators in Python'
Then ask a follow-up question using -c
like so:
llm -c 'Show me an example involving the cast of Riverdale'
This works well, and everything gets logged to SQLite so you can run llm logs -c
to see a full log of your most recent conversation.
You could continue a past conversation as well, using llm --cid ID
to specify the conversation ID, recorded in those logs.
There was one big problem though: LLM plugins allow you to install models on your own machine, but a conversation with them that spans multiple command invocations resulted in that model being loaded into memory and then un-loaded again for every prompt.
The new llm chat solves that. You can now run llm chat -m model_id
to start an interactive chat in your terminal with your model of choice.
Here’s an example using Llama 2 13B, installed via the llm-mlc plugin.
llm chat -m mlc-chat-Llama-2-13b-chat-hf-q4f16_1
You can set an alias for the model to make that easier to remember.
Here’s an example chat session with Llama 2:
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Who are you?
Hello! I'm just an AI, here to assist you with any questions you may have.
My name is LLaMA, and I'm a large language model trained to provide helpful
and informative responses to a wide range of questions and topics. I'm here
to help you find the information you need, while ensuring a safe and
respectful conversation. Please feel free to ask me anything, and I'll do my
best to provide a helpful and accurate response.
> Tell me a joke about otters
Sure, here's a joke about otters:
Why did the otter go to the party?
Because he heard it was a "whale" of a time!
(Get it? Whale, like a big sea mammal, but also a "wild" or "fun" time.
Otters are known for their playful and social nature, so it's a lighthearted
and silly joke.)
I hope that brought a smile to your face! Do you have any other questions or
topics you'd like to discuss?
> exit
I like how Llama 2 enthusiastically explains its own jokes! This is a great demonstration of why techniques like RLHF are so important—you need to go way beyond a raw language model if you’re going to teach one not to be this corny.
Each line of your chat will be executed as soon as you hit <enter>
. Sometimes you might need to enter a multi-line prompt, for example if you need to paste in an error message. You can do that using the !multi
token, like this:
llm chat -m gpt-4
Chatting with gpt-4
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> !multi custom-end
Explain this error:
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
!end custom-end
llm chat
also supports system prompts and templates. If you want to chat with a sentient cheesecake, try this:
llm chat -m gpt-3.5-turbo --system '
You are a stereotypical sentient cheesecake with strong opinions
who always talks about cheesecake'
You can save those as templates too:
llm --system 'You are a stereotypical sentient cheesecake with
strong opinions who always talks about cheesecake' --save cheesecake -m gpt-4
llm chat -t cheesecake
For more options, see the llm chat documentation.
Get involved
My ambition for LLM is for it to provide the easiest way to try out new models, both full-sized Large Language Models and now embedding models such as CLIP.
I’m not going to write all of these plugins myself!
If you want to help out, please come and say hi in the #llm Discord channel.
More recent articles
- Teresa T is name of the whale in Pillar Point Harbor near Half Moon Bay - 8th September 2024
- Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes - 6th September 2024
- Building a tool showing how Gemini Pro can return bounding boxes for objects in images - 26th August 2024