Simon Willison’s Weblog

Int-4 LLaMa is not enough - Int-3 and beyond (via) The Nolano team are experimenting with reducing the size of the LLaMA models even further than the 4bit quantization popularized by llama.cpp.

Posted 13th March 2023 at 11:55 pm

Recent articles

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025
Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas - 22nd October 2025
Living dangerously with Claude - 22nd October 2025

ai 1639 generative-ai 1445 llama 77 local-llms 143 llms 1414

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe