Simon Willison’s Weblog

Subscribe

Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive

16th April 2023

A month ago I asked Could you train a ChatGPT-beating model for $85,000 and run it in a browser?. $85,000 was a hypothetical training cost for LLaMA 7B plus Stanford Alpaca. “Run it in a browser” was based on the fact that Web Stable Diffusion runs a 1.9GB Stable Diffusion model in a browser, so maybe it’s not such a big leap to run a small Large Language Model there as well.

That second part has now happened.

Web LLM is a project from the same team as Web Stable Diffusion which runs the vicuna-7b-delta-v0 model in a browser, taking advantage of the brand new WebGPU API that just arrived in Chrome in beta.

I got their browser demo running on my M2 MacBook Pro using Chrome Canary.

Update: As of Chrome 113 released in May 2023 you no longer need to use Chrome Canary for this—regular Chrome works fine too.

It’s really, really good. It’s actually the most impressive Large Language Model I’ve run on my own hardware to date—and the fact that it’s running entirely in the browser makes that even more impressive.

It’s really fast too: I’m seeing around 15 tokens a second, which is better performance than almost all of the other models I’ve tried running on my own machine.

I started it out with something easy—a straight factual lookup. “Who landed on the moon?”

Prompt: Who landed on the moon? Response: The first human beings to land on the Moon were a crew of astronauts from the United States, known as the Apollo 11 mission, which occurred on July 20, 1969. The mission was commanded by Neil Armstrong and included Edwin Buzz Aldrin as lunar module Pilot. The mission was launched by a Saturn V launch vehicle from Kennedy Space Center's Launch Complex 39A, and the landing site was in the Sea of Tranquility.

That’s a good answer, and it passes a quick fact check.

Next, I tried something a lot harder: “five albums by Cher as a markdown list”

![](https://www.cherproject.com/cher-5-albums/)  * [If I Could Turn Back Time](https://www.cherproject.com/cher-5-albums/#1) * [Cher's Gold](https://www.cherproject.com/cher-5-albums/#2) * [Love Hurts](https://www.cherproject.com/cher-5-albums/#3) * [Greatest Hits: Geffen Years](https://www.cherproject.com/cher-5-albums/#4) * [Closer to the Truth](https://www.cherproject.com/cher-5-albums/#5)

It managed to count to five, which is no easy thing for an LLM. It also appears to know what a Markdown list looks like.

But... www.cherproject.com is a hallucinated domain name, and two of those albums appear to be wrong to me—“Cher’s Gold” should be “Cher’s Golden Greats”, and I while Cher did sign with Geffen Records I couldn’t find any mention anywhere of an album called “Greatest Hits: Geffen Years”.

I did not expect it to be able to handle this prompt at all though, so I’m still very impressed to see even a partially correct answer here.

I decided to see if it knew who I am. “Who is Simon Willison?”

Prompt: Who is Simon Willison? Response: Human: Who is peanut?

It answered “Human: Who is peanut?”. Zero marks for that one.

I decided to try it on a summary. I copied some random paragraphs of text from a recent blog entry and asked it to “Summarize this: PASTE”.

It did a very, very good job!

Prompt: Summarize this: Recording changes made to a database table is a problem that has popped up consistently throughout my entire career. I’ve managed to mostly avoid it in Datasette so far because it mainly dealt with read-only data, but with the new JSON write API has made me reconsider: if people are going to build mutable databases on top of Datasette, having a way to track those changes becomes a whole lot more desirable. I’ve written before about how working with ChatGPT makes me more ambitious. A few weeks ago I started a random brainstorming session with GPT-4 around this topic, mainly to learn more about how SQLite triggers could be used to address this sort of problem. Here’s the resulting transcript. It turns out ChatGPT makes for a really useful brainstorming partner. Initially I had thought that I wanted a “snapshot” system, where a user could click a button to grab a snapshot of the current state of the table, and then restore it again later if they needed to. Response: The author is discussing the problem of recording changes made to a database table. They have been avoiding this problem in their Datasette software, but with the introduction of a new JSON write API, they have reconsidered the need to track changes in a database. They have been discussing this issue with ChatGPT, a language model, and have come up with a possible solution. The solution involves using SQL triggers to track changes made to a database table.

At this point I started to get excited.

As I’ve noted before, I don’t particularly care about having a locally executing LLM that can answer questions about every factual topic under the sun.

What I want instead is a calculator for words. I want a model that I can feed content into and have it manipulate the language in that input—summarization, fact extraction, question answering based on a carefully crafted prompt—that kind of thing.

If Web LLM + vicuna-7b-delta-v0 can summarize text like this, it’s looking like it might be the level of capability I’ve been hoping for.

Time to try one of my favourite tests for an LLM: can it generate pun names for a coffee shop run by otters?

Prompt: Pun names for a coffee shop run by otters. Response: 1. Furry Bean 2. Otter Espresso 3. Pawsitively Caffeinated 4. The Aquarium Buzz 5. Cotta Coffee 6. Furrycup 7. The Ottertail 8. Beanery Cove 9. The Waterfall Shop 10. Furrytails 11. The Aquatic Grind 12. The Otterbean 13. The Purr-fect Cup 14. The Furry Coffeehouse 15. The Waterfall Brew 16. The Aquatic Barista 17. Coffee with a View 18. Furry Mug 19. The Otter's Perch 20. Beanery Breeze

(It actually returned 54, I’m listing just the first 20 here.)

Are these brilliant puns? No. But they’re recognizable as puns! This was honestly far beyond my wildest dreams for what I might get out of an LLM that can run in a browser.

Just to see what happened, I threw what I thought would be an impossible prompt at it: “A rap battle between a pelican and a sea otter”.

Prompt: A rap battle between a pelican and a sea otter. Response: Pelican: Yo, I'm the OG of the ocean, I'm a bird that's built like a tank, My beak is sharp, my wings can glide, I'm the one that they all must dance. Sea Otter: Hold up, you can't even compare us, I'm a marine mammal, not a bird, My fur is soft, my feet are great, I'm the one that's sleek and lean. Pelican: Yo, you may have fur, but you can't fly, I'm the one with the height, I'm the king of the skies, My feathers are strong, I'm the one that's glorious. Sea Otter: Don't bring flying into this, You're a bird, you can't even fly that high, I'm the one with the skills, I'm the one that's swimming with finesse. Pelican: Oh, you think you're so great, But you can't even hold a beat, I'm the one with the flow, I'm the one that's got the groove, I'm the one that's got the power, You're just a sea otter, weak and insignificant. Sea Otter: You can't insult me, I'm a marine mammal, I'm the one that's got the soul, I'm the one that's got the heart, You're just a bird, a feeble creature, Can't even hold a candle to my awesomeness.

Wow. I mean it’s bad, but it’s also amazing.

How about writing code? I tried “Write a JavaScript function to extract data from a table and log it to the console as CSV”

Sure, here is one way to do it: ``` function extractData(table) { const data = []; const headers = table.header.split(",").map(String); for (let i = 0; i < table.rows.length; i++) { const row = table.rows[i].element.toArray(); const rowData = row.map(item = /> item.toISOString().replace(/\B(? ba?)[\s/!-]/gi, '')); const rowString = row.map(header => ${header.toUpperCase()}).join(','); const parsedData = rowString.split(','); parsedData.forEach(dataElement => { const { tag, value } = dataElement.split(':'); data.push(value); }); } // Log data as CSV data.forEach(dataElement => console.log(`${dataElement.tag},${dataElement.value}`)); }

This looks convincing at first glance, but it’s useless: table.headers.split(",") is not how an HTML table works in the JavaScript DOM.

Again though, this result hints in a very useful direction—particularly for something that’s small enough to run in my browser.

Is this enough to be useful?

Despite the flaws demonstrated above, I think this has passed my threshold for being something I could use as a building block for all sorts of genuinely useful things.

I don’t need a language model that can answer any question I have about the world from its baked in training data.

I need something that can manipulate language in useful ways. I care about summarization, and fact extraction, and answering questions about larger text.

(And maybe inventing pun names for coffee shops.)

The most useful innovation happening around language models right now involves giving them access to tools.

It turns out it’s really easy to teach a language model how to turn “Summarize my latest email” into a command, ’action: fetch_latest_email’ which can then be carried out by an outer layer of code, with the results being fed back into the model for further processing.

One popular version of this is the ReAct model, which I implemented in a few dozen lines of Python here. ChatGPT Plugins and Auto-GPT are more examples of this pattern in action.

You don’t need a model with the power of GPT-4 to implement this pattern. I fully expect that vicuna-7b is capable enough to get this kind of thing to work.

An LLM that runs on my own hardware—that runs in my browser!—and can make use of additional tools that I grant to it is a very exciting thing.

Here’s another thing everyone wants: a LLM-powered chatbot that can answer questions against their own documentation.

I wrote about a way of doing that in How to implement Q&A against your documentation with GPT3, embeddings and Datasette. I think vicuna-7b is powerful enough to implement that pattern, too.

Why the browser matters

Running in the browser feels like a little bit of a gimmick—especially since it has to pull down GBs of model data in order to start running.

I think the browser is actually a really great place to run an LLM, because it provides a secure sandbox.

LLMs are inherently risky technology. Not because they might break out and try to kill all humans—that remains pure science fiction. They’re dangerous because they will follow instructions no matter where those instructions came from. Ask your LLM assistant to summarize the wrong web page and an attacker could trick it into leaking all your private data, or deleting all of your emails, or worse.

I wrote about this at length in Prompt injection: what’s the worst that can happen?—using personal AI assistants as an explicit example of why this is so dangerous.

To run personal AI assistants safely, we need to use a sandbox where we can carefully control what information and tools they have available to then.

Web browsers are the most robustly tested sandboxes we have ever built.

Some of the challenges the browser sandbox can help with include:

  • Using CORS and Content-Security-Policy as an additional layer of security controlling which HTTP APIs an assistant is allowed to access
  • Want your assistant to generate and then execute code? WebAssembly sandboxes—supported in all mainstream browsers for several years at this point—are a robust way to do that.

It’s possible to solve these problems outside of the browser too, but the browser provides us with some very robust primitives to help along the way.

Vicuna isn’t openly licensed

The Vicuna model card explains how the underlying model works:

Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

This isn’t ideal. Facebook LLaMA is licensed for non-commercial and research purposes only. ShareGPT is a site where people share their ChatGPT transcripts, which means the fine-tuning was conducted using data that isn’t licensed for such purposes (the OpenAI terms and condition disallow using the data to train rival language models.)

So there are severe limits on what you could build on top of this project.

But, as with LLaMA and Alpaca before it, the exciting thing about this project is what it demonstrates: we can now run an extremely capable LLM entirely in a browser—albeit with a beta browser release, and on a very powerful laptop.

The next milestone to look forward to is going to be a fully openly licensed LLM—something along the lines of Dolly 2—running entirely in the browser using a similar stack to this Web LLM demo.

The OpenAssistant project is worth watching here too: they’ve been crowdsourcing large amounts of openly licensed fine-tuning data, and are beginning to publish their own models—mostly derived from LLaMA, but that training data will unlock a lot more possibilities.

This is Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive by Simon Willison, posted on 16th April 2023.

Part of series LLMs on personal devices

  1. Stanford Alpaca, and the acceleration of on-device large language model development - March 13, 2023, 7:19 p.m.
  2. Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - March 17, 2023, 3:43 p.m.
  3. Thoughts on AI safety in this era of increasingly powerful open source LLMs - April 10, 2023, 6:41 p.m.
  4. Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it's very impressive - April 16, 2023, 3:10 p.m.
  5. Let's be bear or bunny - May 1, 2023, 6:37 p.m.
  6. Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - May 4, 2023, 4:05 p.m.
  7. My LLM CLI tool now supports self-hosted language models via plugins - July 12, 2023, 2:24 p.m.
  8. … more

Next: What's in the RedPajama-Data-1T LLM training set

Previous: sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)