30th August 2023 - Link Blog
WebLLM supports Llama 2 70B now. The WebLLM project from MLC uses WebGPU to run large language models entirely in the browser. They recently added support for Llama 2, including Llama 2 70B, the largest and most powerful model in that family.
To my astonishment, this worked! I used a M2 Mac with 64GB of RAM and Chrome Canary and it downloaded many GBs of data... but it worked, and spat out tokens at a slow but respectable rate of 3.25 tokens/second.
Recent articles
- Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code - 22nd June 2026
- sqlite-utils 4.0rc1 adds migrations and nested transactions - 21st June 2026
- Datasette Apps: Host custom HTML applications inside Datasette - 18th June 2026