Simon Willison’s Weblog

Int-4 LLaMa is not enough - Int-3 and beyond (via) The Nolano team are experimenting with reducing the size of the LLaMA models even further than the 4bit quantization popularized by llama.cpp.

Posted 13th March 2023 at 11:55 pm

Recent articles

Phoenix.new is Fly's entry into the prompt-driven app development space - 23rd June 2025
Trying out the new Gemini 2.5 model family - 17th June 2025
The lethal trifecta for AI agents: private data, untrusted content, and external communication - 16th June 2025

ai 1402 generative-ai 1222 llama 76 local-llms 126 llms 1203

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe