Archive for Sunday, 17th August 2025

Sunday, 17th August 2025

Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.

— Sam Altman, during a "wide-ranging dinner with a small group of reporters in San Francisco"

# 12:53 am / ai, openai, sam-altman

TIL Running a gpt-oss eval suite against LM Studio on a Mac — OpenAI's [gpt-oss models](https://github.com/openai/gpt-oss/) come with an eval suite, which is described in their [Verifying gpt-oss implementations](https://cookbook.openai.com/articles/gpt-oss/verifying-implementations) cookbook. I figured out how to run it on my Mac against their `gpt-oss-20b` model hosted locally using LM Studio, using [uv](https://github.com/astral-sh/uv).

17th Aug 2025, 3:38 am

TIL: Running a gpt-oss eval suite against LM Studio on a Mac. The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations.

I decided to try and run that eval suite on my own MacBook Pro, against gpt-oss-20b running inside of LM Studio.

TLDR: once I had the model running inside LM Studio with a longer than default context limit, the following incantation ran an eval suite in around 3.5 hours:

mkdir /tmp/aime25_openai
OPENAI_API_KEY=x \
  uv run --python 3.13 --with 'gpt-oss[eval]' \
  python -m gpt_oss.evals \
  --base-url http://localhost:1234/v1 \
  --eval aime25 \
  --sampler chat_completions \
  --model openai/gpt-oss-20b \
  --reasoning-effort low \
  --n-threads 2

My new TIL breaks that command down in detail and walks through the underlying eval - AIME 2025, which asks 30 questions (8 times each) that are defined using the following format:

{"question": "Find the sum of all integer bases $b>9$ for which $17_{b}$ is a divisor of $97_{b}$.", "answer": "70"}

# 3:46 am / python, ai, til, openai, generative-ai, local-llms, llms, evals, uv, lm-studio, gpt-oss

← Saturday, 16th August 2025

Monday, 18th August 2025 →

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Simon Willison’s Weblog

Sunday, 17th August 2025