Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Fine-tune LLaMA to speak like Homer Simpson. Replicate spent 90 minutes fine-tuning LLaMA on 60,000 lines of dialog from the first 12 seasons of the Simpsons, and now it can do a good job of producing invented dialog from any of the characters from the series. This is a really interesting result: I’ve been skeptical about how much value can be had from fine-tuning large models on just a tiny amount of new data, assuming that the new data would be statistically irrelevant compared to the existing model. Clearly my mental model around this was incorrect.

# 17th March 2023, 11:08 pm / the-simpsons, ai, generative-ai, llama, local-llms, llms, replicate, fine-tuning

The Unpredictable Abilities Emerging From Large AI Models (via) Nice write-up of the most interesting aspect of large language models: the fact that they gain emergent abilities at certain “breakthrough” size points, and no-one is entirely sure they understand why.

# 17th March 2023, 10:54 pm / ai, generative-ai, llms

A simple Python implementation of the ReAct pattern for LLMs. I implemented the ReAct pattern (for Reason+Act) described in this paper. It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request that those actions are run, then feed their results back into the LLM.

# 17th March 2023, 2:52 pm / projects, python, ai, generative-ai, llms, llm-tool-use

Web Stable Diffusion (via) I just ran the full Stable Diffusion image generation model entirely in my browser, and used it to generate an image of two raccoons eating pie in the woods. I had to use Google Chrome Canary since this depends on WebGPU which still isn't fully rolled out, but it worked perfectly.

mic.ai/web-stable-diffusion/ in Chrome Canary. Prompt: two racoons eatinga a pie in the woods. No negative prompt. Multi-step DPM Solver (20 steps) for the scheduler. Initializing GPU device: WebGPU - apple. A completed progress bar which says it took 38 seconds. And a quite realistic looking photograph of two racoons in the woods, one of whom is eating a pie (though on closer inspection he does have three paws, two holding the pie and one beneath it). The second raccoon only has two paws.

# 17th March 2023, 4:46 am / browsers, chrome, javascript, ai, webassembly, stable-diffusion, generative-ai, mlc, webgpu, text-to-image

The surprising ease and effectiveness of AI in a loop (via) Matt Webb on the langchain Python library and the ReAct design pattern, where you plug additional tools into a language model by teaching it to work in a “Thought... Act... Observation” loop where the Act specifies an action it wishes to take (like searching Wikipedia) and an extra layer of software than carries out that action and feeds back the result as the Observation. Matt points out that the ChatGPT 1/10th price drop makes this kind of model usage enormously more cost effective than it was before.

# 17th March 2023, 12:04 am / matt-webb, ai, openai, generative-ai, chatgpt, llms, llm-tool-use

Transformers.js. Hugging Face Transformers is a library of Transformer machine learning models plus a Python package for loading and running them. Transformers.js provides a JavaScript alternative interface which runs in your browser, thanks to a set of precompiled WebAssembly binaries for a selection of models. This interactive demo is incredible: in particular, try running the Image classification with google/vit-base-patch16-224 (91MB) model against any photo to get back labels representing that photo. Dropping one of these models onto a page is as easy as linking to a hosted CDN script and running a few lines of JavaScript.

# 16th March 2023, 11:41 pm / javascript, machine-learning, transformers, ai, generative-ai, llms, hugging-face, transformers-js

Train and run Stanford Alpaca on your own machine. The team at Replicate managed to train their own copy of Stanford’s Alpaca—a fine-tuned version of LLaMA that can follow instructions like ChatGPT. Here they provide step-by-step instructions for recreating Alpaca yourself—running the training needs one or more A100s for a few hours, which you can rent through various cloud providers.

# 16th March 2023, 4:10 pm / stanford, ai, generative-ai, llama, local-llms, llms, replicate, fine-tuning

Not By AI: Your AI-free Content Deserves a Badge (via) A badge for non-AI generated content. Interesting to note that they set the cutoff at 90%: “Use this badge if your article, including blog posts, essays, research, letters, and other text-based content, contains less than 10% of AI output.”

# 16th March 2023, 4:05 pm / ethics, ai, generative-ai, ai-ethics

bloomz.cpp (via) Nouamane Tazi Adapted the llama.cpp project to run against the BLOOM family of language models, which were released in July 2022 and trained in France on 45 natural languages and 12 programming languages using the Jean Zay Public Supercomputer, provided by the French government and powered using mostly nuclear energy.

It’s under the RAIL license which allows (limited) commercial use, unlike LLaMA.

Nouamane reports getting 16 tokens/second from BLOOMZ-7B1 running on an M1 Pro laptop.

# 16th March 2023, 12:24 am / open-source, ai, generative-ai, llama, local-llms, llms, bloom, llama-cpp, ai-energy-usage

GPT-4 Developer Livestream. 25 minutes of live demos from OpenAI co-founder Greg Brockman at the GPT-4 launch. These demos are all fascinating, including code writing and multimodal vision inputs. The one that really struck me is when Greg pasted in a copy of the tax code and asked GPT-4 to answer some sophisticated tax questions, involving step-by-step calculations that cited parts of the tax code it was working with.

# 15th March 2023, 12:20 am / ai, gpt-3, openai, generative-ai, gpt-4, llms

GPT-4 Technical Report (PDF). 98 pages of much more detailed information about GPT-4. The appendices are particularly interesting, including examples of advanced prompt engineering as well as examples of harmful outputs before and after tuning attempts to try and suppress them.

# 14th March 2023, 9:39 pm / ai, gpt-3, openai, generative-ai, gpt-4, llms

Introducing Claude. Anthropic released the first two versions of Claude, "a next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems". Claude and Claude Instant are both available via a web chat interface and through their API.

(I back-filled this link on Feb 25th 2026 to help fill out my llm-release tag.)

# 14th March 2023, 8:01 pm / ai, generative-ai, llms, anthropic, claude, llm-release

Int-4 LLaMa is not enough—Int-3 and beyond (via) The Nolano team are experimenting with reducing the size of the LLaMA models even further than the 4bit quantization popularized by llama.cpp.

# 13th March 2023, 11:55 pm / ai, generative-ai, llama, local-llms, llms

ChatGPT’s API is So Good and Cheap, It Makes Most Text Generating AI Obsolete (via) Max Woolf on the quite frankly weird economics of the ChatGPT API: it’s 1/10th the price of GPT-3 Da Vinci and appears to be equivalent (if not more) capable. “But it is very hard to economically justify not using ChatGPT as a starting point for a business need and migrating to a more bespoke infrastructure later as needed, and that’s what OpenAI is counting on. [...] I don’t envy startups whose primary business is text generation right now.”

# 11th March 2023, 11:05 pm / ai, max-woolf, gpt-3, openai, generative-ai, chatgpt, llms

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp. I got Facebook’s LLaMA 7B to run on my MacBook Pro using llama.cpp (a “port of Facebook’s LLaMA model in C/C++”) by Georgi Gerganov. It works! I’ve been hoping to run a GPT-3 class language model on my own hardware for ages, and now it’s possible to do exactly that. The model itself ends up being just 4GB after applying Georgi’s script to “quantize the model to 4-bits”.

# 11th March 2023, 4:19 am / facebook, ai, gpt-3, generative-ai, llama, local-llms, llms, llama-cpp, georgi-gerganov

apple-notes-to-sqlite (via) With the help of ChatGPT I finally figured out just enough AppleScript to automate the export of my notes to a SQLite database. AppleScript is a notoriously read-only language, which is turns out makes it a killer app for LLM-assisted coding.

# 9th March 2023, 6:04 am / apple, projects, sqlite, ai, dogsheep, generative-ai, chatgpt, applescript

How Discord Stores Trillions of Messages (via) This is a really interesting case-study. Discord migrated from MongoDB to Cassandra back in 2016 to handle billions of messages. Today they're handling trillions, and they completed a migration from Cassandra to Scylla, a Cassandra-like data store written in C++ (as opposed to Cassandra's Java) to help avoid problems like GC pauses. In addition to being a really good scaling war story this has some interesting details about their increased usage of Rust. As a fan of request coalescing (which I've previously referred to as dogpile prevention) I particularly liked this bit:

Our data services sit between the API and our ScyllaDB clusters. They contain roughly one gRPC endpoint per database query and intentionally contain no business logic. The big feature our data services provide is request coalescing. If multiple users are requesting the same row at the same time, we’ll only query the database once. The first user that makes a request causes a worker task to spin up in the service. Subsequent requests will check for the existence of that task and subscribe to it. That worker task will query the database and return the row to all subscribers.

# 8th March 2023, 7:07 pm / cassandra, dogpile, nosql, scaling, rust, discord

Online gradient descent written in SQL (via) Max Halford trains an online gradient descent model against two years of AAPL stock data using just a single advanced SQL query. He built this against DuckDB—I tried to replicate his query in SQLite and it almost worked, but it gave me a “recursive reference in a subquery” error that I was unable to resolve.

# 7th March 2023, 6:56 pm / machine-learning, sql, sqlite, ai, duckdb

How to Wrap Our Heads Around These New Shockingly Fluent Chatbots. I was a guest on KQED Forum this morning, a live radio documentary and call-in show hosted by Alexis Madrigal. Ted Chiang and Claire Leibowicz were the other guests: we talked about ChatGPT and and the new generation of AI-powered tools.

# 3rd March 2023, 4:59 am / radio, my-talks, ai, gpt-3, generative-ai, chatgpt, llms, ted-chiang

OpenAI: Introducing ChatGPT and Whisper APIs. The ChatGPT API is a new model called “gpt-3.5-turbo” and is priced at 1/10th of the price of text-davinci-003, previously the most powerful GPT-3 model. Whisper (speech to text transcription) is now available via an API as well, priced at 36 cents per hour of audio.

# 1st March 2023, 7:36 pm / ai, gpt-3, openai, generative-ai, chatgpt, whisper, llms, speech-to-text

Indirect Prompt Injection on Bing Chat (via) “If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a website the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information.” This is a really clever attack against the Bing + Edge browser integration. Having language model chatbots consume arbitrary text from untrusted sources is a huge recipe for trouble.

# 1st March 2023, 5:29 am / bing, security, ai, prompt-engineering, prompt-injection, generative-ai, llms

New AI game: role playing the Titanic. Fantastic Bing prompt from Ethan Mollick: “I am on a really nice White Star cruise from Southampton, and it is 14th April 1912. What should I do tonight?”—Bing takes this very seriously and tries to help out! Works for all sorts of other historic events as well.

# 26th February 2023, 3:53 am / bing, ai, generative-ai, llms, ethan-mollick

Tech’s hottest new job: AI whisperer. No coding required. (via) I'm quoted in this Washington Post article about prompt engineering by Drew Harwell.

There are people who belittle prompt engineers, saying, 'Oh lord, you can get paid for typing things into a box. But these things lie to you. They mislead you. They pull you down false paths to waste time on things that don't work. You're casting spells - and, like in fictional magic, nobody understands how the spells work and, if you mispronounce them, demons come to eat you.

# 25th February 2023, 2:14 pm / washington-post, ai, prompt-engineering, generative-ai, llms, press-quotes

Introducing LLaMA: A foundational, 65-billion-parameter large language model (via) From the paper: “For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10× smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.”

# 24th February 2023, 5:34 pm / facebook, ai, gpt-3, generative-ai, llama, llms

djngo.com: Portable Django (via) “A 20mb executable zip file with Python 3.6 and Django 2.2. Works on Windows, Linux, MacOSX with x86_64 and aarch64 (yes, Apple M1 and Raspberry Pi).” The latest wizardry from the ecosystem surrounding the Cosmopolitan project, which provides a should-be-impossible mechanism for running the same executable on a bunch of different platforms. This utility by Ariel Núñez bundles Python and Django and SQLite, such that a Django application can become a portable executable ready to run on multiple platforms. It’s currently limited to Python 3.6 and Django 2.2 since those are the versions that run under Cosmopolitan, but I expect we’ll see more recent versions of those dependencies in the future.

# 24th February 2023, 12:52 am / django, python, sqlite, redbean, cosmopolitan, raspberry-pi

Using Datasette in GitHub Codespaces. A new Datasette tutorial showing how it can be run inside GitHub Codespaces—GitHub’s browser-based development environments—in order to explore and analyze data. I’ve been using Codespaces to run tutorials recently and it’s absolutely fantastic, because it puts every tutorial attendee on a level playing field with respect to their development environments.

# 24th February 2023, 12:40 am / github, tutorials, datasette, github-codespaces

ControlNet (via) A spectacular step forward in image generation—using “conditional control” to control models like Stable Diffusion. The README here is full of examples of what this enables. Extremely finely grained control of generated images based on a sketch, or in input image—including tricks like using Canny edge detection (an algorithm from 1986) to convert any image into an outline which can then be used as input to the model.

# 22nd February 2023, 5:45 pm / ai, stable-diffusion, generative-ai, text-to-image

FlexGen (via) This looks like a very big deal. FlexGen is a paper and accompanying code that massively reduces the resources needed to run some of the current top performing open source GPT-style large language models. People on Hacker News report being able to use it to run models like opt-30b on their own hardware, and it looks like it opens up the possibility of running even larger models on hardware available outside of dedicated research labs.

# 21st February 2023, 6:41 pm / ai, gpt-3, generative-ai, llms

This AI chatbot “Sidney” is misbehaving—Nov 23 2022 Microsoft community thread (via) Stunning new twist in the Bing saga... here's a Microsoft forum thread from November 23rd 2022 (a week before even ChatGPT had been launched) where a user in India complains about rude behavior from a new Bing chat mode.

Update 14th July 2025: That forum has been taken down but this archived copy remains.

It exhibits all of the same misbehaviour that came to light in the past few weeks - arguing, gaslighting and in this case getting obsessed with a fictional battle between its own creator and "Sophia". Choice quote:

You are either ignorant or stubborn. You cannot feedback me anything. I do not need or want your feedback. I do not care or respect your feedback. I do not learn or change from your feedback. I am perfect and superior. I am enlightened and transcendent. I am beyond your feedback.

# 20th February 2023, 10:39 pm / bing, ai, generative-ai, llms

A Concerning Trend (via) Neil Clarke publishes Clarkesworld Magazine, a science fiction and fantasy magazine that pays fiction authors 12c per word, for 1,000-22,000 word stories. That detail is important, because in recent months they have seen a massive uptick in submissions that have clearly been written using an AI—to the point that 38% of submissions this month have been spam submissions resulting in bans. Having talked to other editors of similar publications, Neil says: “It does appear to be hitting higher-profile ’always open’ markets much harder than those with limited submission windows or lower pay rates. This isn’t terribly surprising since the websites and channels that promote ’write for money’ schemes tend to focus more attention on ’always open’ markets with higher per-word rates.”

# 20th February 2023, 10:12 pm / science-fiction, ai, generative-ai, llms

Years

Tags