Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.
Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.
The result appears to provide around a 2.4x speed-up in generating outputs!
Recent articles
- OpenAI's new open weight (Apache 2) models are really good - 5th August 2025
- ChatGPT agent's user-agent - 4th August 2025
- The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences - 3rd August 2025