Simon Willison’s Weblog


13 items tagged “andrejkarpathy”


Andrej Karpathy’s Llama 3 review. The most interesting coverage I’ve seen so far of Meta’s Llama 3 models (8b and 70b so far, 400b promised later).

Andrej notes that Llama 3 trained on 15 trillion tokens—up from 2 trillion for Llama 2—and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.

The tokenizer has also changed—they now use 128,000 tokens, up from 32,000. This results in a 15% drop in the tokens needed to represent a string of text.

The one disappointment is the context length—just 8,192, 2x that of Llama 2 and 4x LLaMA 1 but still pretty small by today’s standards.

If early indications hold, the 400b model could be the first genuinely GPT-4 class openly licensed model. We’ll have to wait and see. # 18th April 2024, 8:50 pm

[on GitHub Copilot] It’s like insisting to walk when you can take a bike. It gets the hard things wrong but all the easy things right, very helpful and much faster. You have to learn what it can and can’t do.

Andrej Karpathy # 11th April 2024, 1:27 am

llm.c (via) Andrej Karpathy implements LLM training—initially for GPT-2, other architectures to follow—in just over 1,000 lines of C on top of CUDA. Includes a tutorial about implementing LayerNorm by porting an implementation from Python. # 9th April 2024, 3:24 pm

When I first published the micrograd repo, it got some traction on GitHub but then somewhat stagnated and it didn’t seem that people cared much. [...] When I made the video that built it and walked through it, it suddenly almost 100X’d the overall interest and engagement with that exact same piece of code.

[...] you might be leaving somewhere 10-100X of the potential of that exact same piece of work on the table just because you haven’t made it sufficiently accessible.

Andrej Karpathy # 21st February 2024, 9:26 pm

Let’s build the GPT Tokenizer. When Andrej Karpathy left OpenAI last week a lot of people expressed hope that he would be increasing his output of educational YouTube videos.

Here’s an in-depth 2 hour dive into how tokenizers work and how to build one from scratch, published this morning.

The section towards the end, “revisiting and explaining the quirks of LLM tokenization”, helps explain a number of different LLM weaknesses—inability to reverse strings, confusion over arithmetic and even a note on why YAML can work better than JSON when providing data to LLMs (the same data can be represented in less tokens). # 20th February 2024, 6:02 pm


I always struggle a bit with I’m asked about the “hallucination problem” in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

We direct their dreams with prompts. The prompts start the dream, and based on the LLM’s hazy recollection of its training documents, most of the time the result goes someplace useful.

It’s only when the dreams go into deemed factually incorrect territory that we label it a “hallucination”. It looks like a bug, but it’s just the LLM doing what it always does.

Andrej Karpathy # 9th December 2023, 6:08 am

YouTube: Intro to Large Language Models. Andrej Karpathy is an outstanding educator, and this one hour video offers an excellent technical introduction to LLMs.

At 42m Andrej expands on his idea of LLMs as the center of a new style of operating system, tying together tools and and a filesystem and multimodal I/O.

There’s a comprehensive section on LLM security—jailbreaking, prompt injection, data poisoning—at the 45m mark.

I also appreciated his note on how parameter size maps to file size: Llama 70B is 140GB, because each of those 70 billion parameters is a 2 byte 16bit floating point number on disk. # 23rd November 2023, 5:02 pm

Looking at LLMs as chatbots is the same as looking at early computers as calculators. We’re seeing an emergence of a whole new computing paradigm, and it is very early.

Andrej Karpathy # 28th September 2023, 8:50 pm

llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers [...] TLDR at batch_size=1 (i.e. just generating a single stream of prediction on your computer), the inference is super duper memory-bound. The on-chip compute units are twiddling their thumbs while sucking model weights through a straw from DRAM. [...] A100: 1935 GB/s memory bandwidth, 1248 TOPS. MacBook M2: 100 GB/s, 7 TFLOPS. The compute is ~200X but the memory bandwidth only ~20X. So the little M2 chip that could will only be about ~20X slower than a mighty A100.

Andrej Karpathy # 16th August 2023, 4:13 am

The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.

Andrej Karpathy # 4th February 2023, 12:08 am

nanoGPT. “The simplest, fastest repository for training/finetuning medium-sized GPTs”—by Andrej Karpathy, in about 600 lines of Python. # 2nd January 2023, 11:27 pm


karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos. # 6th September 2022, 2:52 pm

To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Andrej Karpathy # 24th August 2022, 9:28 pm