Simon Willison’s Weblog

Subscribe

6 items tagged “o1”

OpenAI’s o1 family of models.

2024

o1 prompting is alien to me. Its thinking, gloriously effective at times, is also dreamlike and unamenable to advice.

Just say what you want and pray. Any notes on “how” will be followed with the diligence of a brilliant intern on ketamine.

Riley Goodside

# 16th September 2024, 5:28 pm / ai, openai, prompt-engineering, generative-ai, riley-goodside, llms, o1

[… OpenAI’s o1] could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student.

Terrence Tao

# 15th September 2024, 12:04 am / mathematics, ai, openai, generative-ai, llms, o1

Believe it or not, the name Strawberry does not come from the “How many r’s are in strawberry” meme. We just chose a random word. As far as we know it was a complete coincidence.

Noam Brown, OpenAI

# 13th September 2024, 11:35 am / ai, openai, generative-ai, llms, o1

o1-mini is the most surprising research result I've seen in the past year

Obviously I cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe

Jason Wei (OpenAI)

# 12th September 2024, 11:45 pm / ai, openai, generative-ai, llms, o1

LLM 0.16. New release of LLM adding support for the o1-preview and o1-mini OpenAI models that were released today.

# 12th September 2024, 11:20 pm / projects, ai, openai, generative-ai, llms, llm, o1

Notes on OpenAI’s new o1 chain-of-thought models

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is not a preview)—previously rumored as having the codename “strawberry”. There’s a lot to understand about these models—they’re not as simple as the next step up from GPT-4o, instead introducing some major trade-offs in terms of cost and performance in exchange for improved “reasoning” capabilities.

[... 1,568 words]