Posts tagged llama, webassembly in 2023

Filters: Year: 2023 × llama × webassembly × Sorted by date

3 results

WebLLM supports Llama 2 70B now. The WebLLM project from MLC uses WebGPU to run large language models entirely in the browser. They recently added support for Llama 2, including Llama 2 70B, the largest and most powerful model in that family.

To my astonishment, this worked! I used a M2 Mac with 64GB of RAM and Chrome Canary and it downloaded many GBs of data... but it worked, and spat out tokens at a slow but respectable rate of 3.25 tokens/second.

# 30th August 2023, 2:41 pm / ai, webassembly, generative-ai, llama, llms, mlc, webgpu

Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive

A month ago I asked Could you train a ChatGPT-beating model for $85,000 and run it in a browser?. $85,000 was a hypothetical training cost for LLaMA 7B plus Stanford Alpaca. “Run it in a browser” was based on the fact that Web Stable Diffusion runs a 1.9GB Stable Diffusion model in a browser, so maybe it’s not such a big leap to run a small Large Language Model there as well.

[... 2,276 words]

3:10 pm / 16th April 2023 / chrome, ai, webassembly, generative-ai, llama, local-llms, llms, mlc, vicuna

Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

I think it’s now possible to train a large language model with similar functionality to GPT-3 for $85,000. And I think we might soon be able to run the resulting model entirely in the browser, and give it capabilities that leapfrog it ahead of ChatGPT.

[... 1,751 words]

3:43 pm / 17th March 2023 / ai, webassembly, generative-ai, chatgpt, llama, local-llms, llms, bloom, mlc, transformers-js, llama-cpp

Simon Willison’s Weblog