Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.
Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.
The result appears to provide around a 2.4x speed-up in generating outputs!
Recent articles
- Claude Code for web - a new asynchronous coding agent from Anthropic - 20th October 2025
- Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code - 20th October 2025
- Claude Skills are awesome, maybe a bigger deal than MCP - 16th October 2025