Reflections on OpenAI (via) Calvin French-Owen spent just over a year working at OpenAI, during which time the organization grew from 1,000 to 3,000 people and Calvin found himself in "the top 30% by tenure".
His reflections on leaving are fascinating - absolutely crammed with detail about OpenAI's internal culture that I haven't seen described anywhere else before.
I think of OpenAI as an organization that started like Los Alamos. It was a group of scientists and tinkerers investigating the cutting edge of science. That group happened to accidentally spawn the most viral consumer app in history. And then grew to have ambitions to sell to governments and enterprises.
There's a lot in here, and it's worth spending time with the whole thing. A few points that stood out to me below.
Firstly, OpenAI are a Python shop who lean a whole lot on Pydantic and FastAPI:
OpenAI uses a giant monorepo which is ~mostly Python (though there is a growing set of Rust services and a handful of Golang services sprinkled in for things like network proxies). This creates a lot of strange-looking code because there are so many ways you can write Python. You will encounter both libraries designed for scale from 10y Google veterans as well as throwaway Jupyter notebooks newly-minted PhDs. Pretty much everything operates around FastAPI to create APIs and Pydantic for validation. But there aren't style guides enforced writ-large.
ChatGPT's success has influenced everything that they build, even at a technical level:
Chat runs really deep. Since ChatGPT took off, a lot of the codebase is structured around the idea of chat messages and conversations. These primitives are so baked at this point, you should probably ignore them at your own peril.
Here's a rare peek at how improvements to large models get discovered and incorporated into training runs:
How large models are trained (at a high-level). There's a spectrum from "experimentation" to "engineering". Most ideas start out as small-scale experiments. If the results look promising, they then get incorporated into a bigger run. Experimentation is as much about tweaking the core algorithms as it is tweaking the data mix and carefully studying the results. On the large end, doing a big run almost looks like giant distributed systems engineering. There will be weird edge cases and things you didn't expect.