18th April 2024 - Link Blog
Andrej Karpathy's Llama 3 review. The most interesting coverage I’ve seen so far of Meta’s Llama 3 models (8b and 70b so far, 400b promised later).
Andrej notes that Llama 3 trained on 15 trillion tokens—up from 2 trillion for Llama 2—and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.
The tokenizer has also changed—they now use 128,000 tokens, up from 32,000. This results in a 15% drop in the tokens needed to represent a string of text.
The one disappointment is the context length—just 8,192, 2x that of Llama 2 and 4x LLaMA 1 but still pretty small by today’s standards.
If early indications hold, the 400b model could be the first genuinely GPT-4 class openly licensed model. We’ll have to wait and see.
Recent articles
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty - 19th March 2026
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52 - 17th March 2026
- My fireside chat about agentic engineering at the Pragmatic Summit - 14th March 2026