30th August 2024
We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.
For each decoded token, LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for a 100M token context window.
The contrast in memory requirements is even larger -- running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context.
— Magic AI
Recent articles
- Perhaps not Boring Technology after all - 9th March 2026
- Can coding agents relicense open source through a “clean room” implementation of code? - 5th March 2026
- Something is afoot in the land of Qwen - 4th March 2026