We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.
For each decoded token, LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for a 100M token context window.
The contrast in memory requirements is even larger -- running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context.
— Magic AI
Recent articles
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024