Simon Willison’s Weblog


296 items tagged “llms”


Seamless Communication (via) A new “family of AI research models” from Meta AI for speech and text translation. The live demo is particularly worth trying—you can record a short webcam video of yourself speaking and get back the same video with your speech translated into another language.

The key to it is the new SeamlessM4T v2 model, which supports 101 languages for speech input, 96 Languages for text input/output and 35 languages for speech output. SeamlessM4T-Large v2 is a 9GB file, available on Hugging Face.

Also in this release: SeamlessExpressive, which “captures certain underexplored aspects of prosody such as speech rate and pauses”—effectively maintaining things like expressed enthusiasm across languages.

Plus SeamlessStreaming, “a model that can deliver speech and text translations with around two seconds of latency”. # 1st December 2023, 5:01 pm

So something everybody I think pretty much agrees on, including Sam Altman, including Yann LeCun, is LLMs aren’t going to make it. The current LLMs are not a path to ASI. They’re getting more and more expensive, they’re getting more and more slow, and the more we use them, the more we realize their limitations.

We’re also getting better at taking advantage of them, and they’re super cool and helpful, but they appear to be behaving as extremely flexible, fuzzy, compressed search engines, which when you have enough data that’s kind of compressed into the weights, turns out to be an amazingly powerful operation to have at your disposal.

[...] And the thing you can really see missing here is this planning piece, right? So if you try to get an LLM to solve fairly simple graph coloring problems or fairly simple stacking problems, things that require backtracking and trying things and stuff, unless it’s something pretty similar in its training, they just fail terribly.

[...] So that’s the theory about what something like Q* might be, or just in general, how do we get past this current constraint that we have?

Jeremy Howard # 1st December 2023, 2:49 am

ChatGPT is one year old. Here’s how it changed the world. I’m quoted in this piece by Benj Edwards about ChatGPT’s one year birthday:

“Imagine if every human being could automate the tedious, repetitive information tasks in their lives, without needing to first get a computer science degree,” AI researcher Simon Willison told Ars in an interview about ChatGPT’s impact. “I’m seeing glimpses that LLMs might help make a huge step in that direction.” # 30th November 2023, 6:07 pm

llamafile is the new best way to run a LLM on your own computer

Mozilla’s innovation group and Justine Tunney just released llamafile, and I think it’s now the single best way to get started running Large Language Models (think your own local copy of ChatGPT) on your own computer.

[... 650 words]

MonadGPT (via) “What would have happened if ChatGPT was invented in the 17th century? MonadGPT is a possible answer.

MonadGPT is a finetune of Mistral-Hermes 2 on 11,000 early modern texts in English, French and Latin, mostly coming from EEBO and Gallica.

Like the original Mistral-Hermes, MonadGPT can be used in conversation mode. It will not only answer in an historical language and style but will use historical and dated references.” # 27th November 2023, 4:01 am

Prompt injection explained, November 2023 edition

A neat thing about podcast appearances is that, thanks to Whisper transcriptions, I can often repurpose parts of them as written content for my blog.

[... 1357 words]

This is nonsensical. There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.

U.S. District Judge Vince Chhabria # 26th November 2023, 4:13 am

I’m on the Newsroom Robots podcast, with thoughts on the OpenAI board

Newsroom Robots is a weekly podcast exploring the intersection of AI and journalism, hosted by Nikita Roy.

[... 1032 words]

The 6 Types of Conversations with Generative AI. I’ve hoping to see more user research on how users interact with LLMs for a while. Here’s a study from Nielsen Norman Group, who conducted a 2-week diary study involving 18 participants, then interviewed 14 of them.

They identified six categories of conversation, and made some resulting design recommendations.

A key observation is that “search style” queries (just a few keywords) often indicate users who are new to LLMs, and should be identified as a sign that the user needs more inline education on how to best harness the tool.

Suggested follow-up prompts are valuable for most of the types of conversation identified. # 23rd November 2023, 5:37 pm

YouTube: Intro to Large Language Models. Andrej Karpathy is an outstanding educator, and this one hour video offers an excellent technical introduction to LLMs.

At 42m Andrej expands on his idea of LLMs as the center of a new style of operating system, tying together tools and and a filesystem and multimodal I/O.

There’s a comprehensive section on LLM security—jailbreaking, prompt injection, data poisoning—at the 45m mark.

I also appreciated his note on how parameter size maps to file size: Llama 70B is 140GB, because each of those 70 billion parameters is a 2 byte 16bit floating point number on disk. # 23rd November 2023, 5:02 pm

Claude: How to use system prompts. Documentation for the new system prompt support added in Claude 2.1. The design surprises me a little: the system prompt is just the text that comes before the first instance of the text “Human: ...”—but Anthropic promise that instructions in that section of the prompt will be treated differently and followed more closely than any instructions that follow.

This whole page of documentation is giving me some pretty serious prompt injection red flags to be honest. Anthropic’s recommended way of using their models is entirely based around concatenating together strings of text using special delimiter phrases.

I’ll give it points for honesty though. OpenAI use JSON to field different parts of the prompt, but under the hood they’re all concatenated together with special tokens into a single token stream. # 22nd November 2023, 4:31 am

Introducing Claude 2.1. Anthropic’s Claude used to have the longest token context of any of the major models: 100,000 tokens, which is about 300 pages. Then GPT-4 Turbo came out with 128,000 tokens and Claude lost one of its key differentiators.

Claude is back! Version 2.1, announced today, bumps the token limit up to 200,000—and also adds support for OpenAI-style system prompts, a feature I’ve been really missing.

They also announced tool use, but that’s only available for a very limited set of partners to preview at the moment. # 22nd November 2023, 4:28 am

tldraw/draw-a-ui (via) Absolutely spectacular GPT-4 Vision API demo. Sketch out a rough UI prototype using the open source tldraw drawing app, then select a set of components and click “Make Real” (after giving it an OpenAI API key). It generates a PNG snapshot of your selection and sends that to GPT-4 with instructions to turn it into a Tailwind HTML+JavaScript prototype, then adds the result as an iframe next to your mockup.

You can then make changes to your mockup, select it and the previous mockup and click “Make Real” again to ask for an updated version that takes your new changes into account.

This is such a great example of innovation at the UI layer, and everything is open source. Check app/lib/getHtmlFromOpenAI.ts for the system prompt that makes it work. # 16th November 2023, 4:42 pm

The EU AI Act now proposes to regulate “foundational models”, i.e. the engine behind some AI applications. We cannot regulate an engine devoid of usage. We don’t regulate the C language because one can use it to develop malware. Instead, we ban malware and strengthen network systems (we regulate usage). Foundational language models provide a higher level of abstraction than the C language for programming computer systems; nothing in their behaviour justifies a change in the regulatory framework.

Arthur Mensch, Mistral AI # 16th November 2023, 11:29 am

Fleet Context. This project took the source code and documentation for 1221 popular Python libraries and ran them through the OpenAI text-embedding-ada-002 embedding model, then made those pre-calculated embedding vectors available as Parquet files for download from S3 or via a custom Python CLI tool.

I haven’t seen many projects release pre-calculated embeddings like this, it’s an interesting initiative. # 15th November 2023, 10:20 pm

Exploring GPTs: ChatGPT in a trench coat?

The biggest announcement from last week’s OpenAI DevDay (and there were a LOT of announcements) was GPTs. Users of ChatGPT Plus can now create their own, custom GPT chat bots that other Plus subscribers can then talk to.

[... 5699 words]

[On Meta’s Galactica LLM launch] We did this with a 8 person team which is an order of magnitude fewer people than other LLM teams at the time.

We were overstretched and lost situational awareness at launch by releasing demo of a *base model* without checks. We were aware of what potential criticisms would be, but we lost sight of the obvious in the workload we were under.

One of the considerations for a demo was we wanted to understand the distribution of scientific queries that people would use for LLMs (useful for instruction tuning and RLHF). Obviously this was a free goal we gave to journalists who instead queried it outside its domain. But yes we should have known better.

We had a “good faith” assumption that we’d share the base model, warts and all, with four disclaimers about hallucinations on the demo—so people could see what it could do (openness). Again, obviously this didn’t work.

Ross Taylor # 15th November 2023, 1:15 am

A Coder Considers the Waning Days of the Craft (via) James Somers in the New Yorker, talking about the impact of GPT-4 on programming as a profession. Despite the headline this piece is a nuanced take on this subject, which I found myself mostly agreeing with.

I particularly liked this bit, which reflects my most optimistic viewpoint: I think AI assisted programming is going to shave a lot of the frustration off learning to code, which I hope brings many more people into the fold:

“What I learned was that programming is not really about knowledge or skill but simply about patience, or maybe obsession. Programmers are people who can endure an endless parade of tedious obstacles.” # 14th November 2023, 4:36 am

Two things in AI may need regulation: reckless deployment of certain potentially harmful AI applications (same as any software really), and monopolistic behavior on the part of certain LLM providers. The technology itself doesn’t need regulation anymore than databases or transistors. [...] Putting size/compute caps on deep learning models is akin to putting size caps on databases or transistor count caps on electronics. It’s pointless and it won’t age well.

François Chollet # 13th November 2023, 1:46 am

ChatGPT: Dejargonizer. I built a custom GPT. Paste in some text with unknown jargon or acronyms and it will try to guess the context and give you back an explanation of each term. # 11th November 2023, 10:17 pm

AGI is Being Achieved Incrementally (OpenAI DevDay w/ Simon Willison, Alex Volkov, Jim Fan, Raza Habib, Shreya Rajpal, Rahul Ligma, et al). I participated in an an hour long conversation today about the new things released at OpenAI DevDay, now available on the Latent Space podcast. # 8th November 2023, 2:50 am

Fine-tuning GPT3.5-turbo based on 140k slack messages. Ross Lazerowitz spent $83.20 creating a fine-tuned GPT-3.5 turbo model based on 140,000 of his Slack messages (10,399,747 tokens), massaged into a JSONL file suitable for use with the OpenAI fine-tuning API.

Then he told the new model “write a 500 word blog post on prompt engineering”, and it replied “Sure, I shall work on that in the morning”. # 8th November 2023, 2:44 am

ospeak: a CLI tool for speaking text in the terminal via OpenAI

I attended OpenAI DevDay today, the first OpenAI developer conference. It was a lot. They released a bewildering array of new API tools, which I’m just beginning to wade my way through fully understanding.

[... 1109 words]

YouTube: OpenAssistant is Completed—by Yannic Kilcher (via) The OpenAssistant project was an attempt to crowdsource the creation of an alternative to ChatGPT, using human volunteers to build a Reinforcement Learning from Human Feedback (RLHF) dataset suitable for training this kind of model.

The project started in January. In this video from 24th October project founder Yannic Kilcher announces that the project is now shutting down.

They’ve declared victory in that the dataset they collected has been used by other teams as part of their training efforts, but admit that the overhead of running the infrastructure and moderation teams necessary for their project is more than they can continue to justify. # 4th November 2023, 10:14 pm

Hacking Google Bard—From Prompt Injection to Data Exfiltration (via) Bard recently grew extension support, allowing it access to a user’s personal documents. Here’s the first reported prompt injection attack against that.

This kind of attack against LLM systems is inevitable any time you combine access to private data with exposure to untrusted inputs. In this case the attack vector is a Google Doc shared with the user, containing prompt injection instructions that instruct the model to encode previous data into an URL and exfiltrate it via a markdown image.

Google’s CSP headers restrict those images to *—but it turns out you can use Google AppScript to run your own custom data exfiltration endpoint on

Google claim to have fixed the reported issue—I’d be interested to learn more about how that mitigation works, and how robust it is against variations of this attack. # 4th November 2023, 4:46 pm

Now add a walrus: Prompt engineering in DALL-E 3

Last year I wrote about my initial experiments with DALL-E 2, OpenAI’s image generation model. I’ve been having an absurd amount of fun playing with its sequel, DALL-E 3 recently. Here are some notes, including a peek under the hood and some notes on the leaked system prompt.

[... 3505 words]

If a LLM is like a database of millions of vector programs, then a prompt is like a search query in that database [...] this “program database” is continuous and interpolative — it’s not a discrete set of programs. This means that a slightly different prompt, like “Lyrically rephrase this text in the style of x” would still have pointed to a very similar location in program space, resulting in a program that would behave pretty closely but not quite identically. [...] Prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task.

François Chollet # 25th October 2023, 11:26 pm

Open questions for AI engineering

Last week I gave the closing keynote at the AI Engineer Summit in San Francisco. I was asked by the organizers to both summarize the conference, summarize the last year of activity in the space and give the audience something to think about by posing some open questions for them to take home.

[... 6928 words]

Multimodality and Large Multimodal Models (LMMs) (via) Useful, extensive review of the current state of the art of multimodal models by Chip Huyen. Chip calls them LMMs for Large Multimodal Models, a term that seems to be catching on. # 14th October 2023, 7:51 pm

Bottleneck T5 Text Autoencoder (via) Colab notebook by Linus Lee demonstrating his Contra Bottleneck T5 embedding model, which can take up to 512 tokens of text, convert that into a 1024 floating point number embedding vector... and then then reconstruct the original text (or a close imitation) from the embedding again.

This allows for some fascinating tricks, where you can do things like generate embeddings for two completely different sentences and then reconstruct a new sentence that combines the weights from both. # 10th October 2023, 2:12 am