9th June 2026 - Link Blog
Introducing the Third Generation of Apple’s Foundation Models (via) Detailed co erase of the new foundation models available with iOS and macOS 27:
At the heart of this architecture is our third generation of Apple Foundation Models (AFM), a family of five foundation models custom-built in collaboration with Google. These span from on-device models to server-based models running on Private Cloud Compute.
There are two on-device models: a 3 billion parameter dense model (input: text and images, output: text), where all parameters are used for every query, and a 20 billion parameter multimodal model (input: text, images, audio, output: text and audio) which is a much more interesting shape:
Rather than using a single model for all tasks or managing an ensemble of smaller models, AFM 3 Core Advanced uses a predetermined number of active parameters tailored to each specific use case. This allows weights to be loaded incrementally across requests of varying difficulty, scaling the model size far beyond traditional DRAM limits while minimizing latency. [...]
Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt. A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation. To minimize data movement, the model relies on a high percentage of always-active “shared experts” alongside input-dependent “routed experts” swapped into DRAM only when needed.
This is not quite the same thing as typical Mixture-of-Experts models. In most MoE models the "experts" are swapped out for every token. Apple are instead making those decisions "per prompt", saving on all of that high bandwidth weight swapping.
The three cloud models are described like this:
- AFM 3 Cloud, our server-side workhorse, optimized for speed, efficiency, and performance.
- ADM 3 Cloud (Image), for image generation and editing, which unlocks advanced photo-editing tools, the all-new Image Playground, and more.
- AFM 3 Cloud Pro, our most capable server-based model, which powers our most demanding use cases, like agentic tool use and complex reasoning.
All but the Cloud Pro model continue to run on Apple silicon. Cloud Pro is the only model running on NVIDIA GPUs in Google Cloud.
Embed screenshot and link to https://x.com/jchammond_/status/2064206029370630529?s=46
Recent articles
- Datasette Apps: Host custom HTML applications inside Datasette - 18th June 2026
- GLM-5.2 is probably the most powerful text-only open weights LLM - 17th June 2026
- Publishing WASM wheels to PyPI for use with Pyodide - 13th June 2026