The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.
Recent articles
- New prompt injection papers: Agents Rule of Two and The Attacker Moves Second - 2nd November 2025
- Hacking the WiFi-enabled color screen GitHub Universe conference badge - 28th October 2025
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025