16 posts tagged “til”
2025
TIL: Running a gpt-oss eval suite against LM Studio on a Mac. The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations.
I decided to try and run that eval suite on my own MacBook Pro, against gpt-oss-20b
running inside of LM Studio.
TLDR: once I had the model running inside LM Studio with a longer than default context limit, the following incantation ran an eval suite in around 3.5 hours:
mkdir /tmp/aime25_openai
OPENAI_API_KEY=x \
uv run --python 3.13 --with 'gpt-oss[eval]' \
python -m gpt_oss.evals \
--base-url http://localhost:1234/v1 \
--eval aime25 \
--sampler chat_completions \
--model openai/gpt-oss-20b \
--reasoning-effort low \
--n-threads 2
My new TIL breaks that command down in detail and walks through the underlying eval - AIME 2025, which asks 30 questions (8 times each) that are defined using the following format:
{"question": "Find the sum of all integer bases $b>9$ for which $17_{b}$ is a divisor of $97_{b}$.", "answer": "70"}
simonw/codespaces-llm. GitHub Codespaces provides full development environments in your browser, and is free to use with anyone with a GitHub account. Each environment has a full Linux container and a browser-based UI using VS Code.
I found out today that GitHub Codespaces come with a GITHUB_TOKEN
environment variable... and that token works as an API key for accessing LLMs in the GitHub Models collection, which includes dozens of models from OpenAI, Microsoft, Mistral, xAI, DeepSeek, Meta and more.
Anthony Shaw's llm-github-models plugin for my LLM tool allows it to talk directly to GitHub Models. I filed a suggestion that it could pick up that GITHUB_TOKEN
variable automatically and Anthony shipped v0.18.0 with that feature a few hours later.
... which means you can now run the following in any Python-enabled Codespaces container and get a working llm
command:
pip install llm
llm install llm-github-models
llm models default github/gpt-4.1
llm "Fun facts about pelicans"
Setting the default model to github/gpt-4.1
means you get free (albeit rate-limited) access to that OpenAI model.
To save you from needing to even run that sequence of commands I've created a new GitHub repository, simonw/codespaces-llm, which pre-installs and runs those commands for you.
Anyone with a GitHub account can use this URL to launch a new Codespaces instance with a configured llm
terminal command ready to use:
codespaces.new/simonw/codespaces-llm?quickstart=1
While putting this together I wrote up what I've learned about devcontainers so far as a TIL: Configuring GitHub Codespaces using devcontainers.
TIL: Rate limiting by IP using Cloudflare’s rate limiting rules.
My blog started timing out on some requests a few days ago, and it turned out there were misbehaving crawlers that were spidering my /search/
page even though it's restricted by robots.txt
.
I run this site behind Cloudflare and it turns out Cloudflare's WAF (Web Application Firewall) has a rate limiting tool that I could use to restrict requests to /search/*
by a specific IP to a maximum of 5 every 10 seconds.
TIL: Using Playwright MCP with Claude Code. Inspired by Armin ("I personally use only one MCP - I only use Playwright") I decided to figure out how to use the official Playwright MCP server with Claude Code.
It turns out it's easy:
claude mcp add playwright npx '@playwright/mcp@latest'
claude
The claude mcp add
command only affects the current directory by default - it gets persisted in the ~/.claude.json
file.
Now Claude can use Playwright to automate a Chrome browser! Tell it to "Use playwright mcp to open a browser to example.com" and watch it go - it can navigate pages, submit forms, execute custom JavaScript and take screenshots to feed back into the LLM.
The browser window stays visible which means you can interact with it too, including signing into websites so Claude can act on your behalf.
TIL: SQLite triggers. I've been doing some work with SQLite triggers recently while working on sqlite-chronicle, and I decided I needed a single reference to exactly which triggers are executed for which SQLite actions and what data is available within those triggers.
I wrote this triggers.py script to output as much information about triggers as possible, then wired it into a TIL article using Cog. The Cog-powered source code for the TIL article can be seen here.
TIL: Styling an HTML dialog modal to take the full height of the viewport.
I spent some time today trying to figure out how to have a modal <dialog>
element present as a full height side panel that animates in from the side. The full height bit was hard, until Natalie helped me figure out that browsers apply a default max-height: calc(100% - 6px - 2em);
rule which needs to be over-ridden.
Also included: some spelunking through the HTML spec to figure out where that calc()
expression was first introduced. The answer was November 2020.
Using a Tailscale exit node with GitHub Actions. New TIL. I started running a git scraper against doge.gov to track changes made to that website over time. The DOGE site runs behind Cloudflare which was blocking requests from the GitHub Actions IP range, but I figured out how to run a Tailscale exit node on my Apple TV and use that to proxy my shot-scraper requests.
The scraper is running in simonw/scrape-doge-gov. It uses the new shot-scraper har command I added in shot-scraper 1.6 (and improved in shot-scraper 1.7).
TIL: Downloading every video for a TikTok account. TikTok may or may not be banned in the USA within the next 24 hours or so. I figured out a gnarly pattern for downloading every video from a specified account, using browser console JavaScript to scrape the video URLs and yt-dlp to fetch each video. As a bonus, I included a recipe for generating a Whisper transcript of every video with mlx-whisper and a hacky way to show a progress bar for the downloads.
2024
TIL: Using uv to develop Python command-line applications.
I've been increasingly using uv to try out new software (via uvx
) and experiment with new ideas, but I hadn't quite figured out the right way to use it for developing my own projects.
It turns out I was missing a few things - in particular the fact that there's no need to use uv pip
at all when working with a local development environment, you can get by entirely on uv run
(and maybe uv sync --extra test
to install test dependencies) with no direct invocations of uv pip
at all.
I bounced a few questions off Charlie Marsh and filled in the missing gaps - this TIL shows my new uv-powered process for hacking on Python CLI apps built using Click and my simonw/click-app cookecutter template.
Julia Evans: TIL. I've always loved how Julia Evans emphasizes the joy of learning and how you should celebrate every new thing you learn and never be ashamed to admit that you haven't figured something out yet. That attitude was part of my inspiration when I started writing TILs a few years ago.
Julia just started publishing TILs too, and I'm delighted to learn that this was partially inspired by my own efforts!
2023
Write about what you learn. It pushes you to understand topics better. (via) Addy Osmani clearly articulates why writing frequently is such a powerful tool for learning more effectively. This post doesn’t mention TILs but it perfectly encapsulates the value I get from publishing them.
2021
Annotated explanation of David Beazley’s dataklasses (via) David Beazley released a self-described “deliciously evil spin on dataclasses” that uses some deep Python trickery to implement a dataclass style decorator which creates classes that import 15-20 times faster than the original. I put together a heavily annotated version of his code while trying to figure out how all of the different Python tricks in it work.
One year of TILs
Just over a year ago I started tracking TILs, inspired by Josh Branchaud’s collection. I’ve since published 148 TILs across 43 different topics. It’s a great format!
[... 224 words]2020
Weeknotes: Working on my screenplay
I’m taking an Introduction to Screenwriting course with Adam Tobin at Stanford, and my partial screenplay is due this week. I’m pulling together some scenes that tell the story of the Russian 1917 February Revolution and the fall of the Tsar through the lens of the craftsmen working on the Tsar’s last Fabergé egg. So I’ve not been spending much time on anything else.
[... 226 words]Restricting SSH connections to devices within a Tailscale network. TIL how to run SSH on a VPS instance (in this case Amazon Lightsail) such that it can only be SSHd to by devices connected to a private Tailscale VPN.
Using a self-rewriting README powered by GitHub Actions to track TILs
I’ve started tracking TILs—Today I Learneds—inspired by this five-year-and-counting collection by Josh Branchaud on GitHub (found via Hacker News). I’m keeping mine in GitHub too, and using GitHub Actions to automatically generate an index page README in the repository and a SQLite-backed search engine.
[... 1,100 words]