<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: moonshot</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/moonshot.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-01-27T15:07:41+00:00</updated><author><name>Simon Willison</name></author><entry><title>Kimi K2.5: Visual Agentic Intelligence</title><link href="https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-tag" rel="alternate"/><published>2026-01-27T15:07:41+00:00</published><updated>2026-01-27T15:07:41+00:00</updated><id>https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.kimi.com/blog/kimi-k2-5.html"&gt;Kimi K2.5: Visual Agentic Intelligence&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Kimi K2 landed &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/"&gt;in July&lt;/a&gt; as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking &lt;a href="https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/"&gt;in November&lt;/a&gt; which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:&lt;/p&gt;
&lt;blockquote id="complex-tasks"&gt;&lt;p&gt;For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I used the &lt;a href="https://openrouter.ai/moonshotai/kimi-k2.5"&gt;OpenRouter Chat UI&lt;/a&gt; to have it "Generate an SVG of a pelican riding a bicycle", and it did &lt;a href="https://gist.github.com/simonw/32a85e337fbc6ee935d10d89726c0476"&gt;quite well&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame." src="https://static.simonwillison.net/static/2026/kimi-k2.5-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/ee2583b2eb5706400a4737f56d57c456"&gt;the full response&lt;/a&gt;. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt &lt;a href="https://claude.ai/share/df9258e7-97ba-4362-83da-76d31d96196f"&gt;against Claude Opus 4.5&lt;/a&gt; and &lt;a href="https://chatgpt.com/share/6978d48c-3f20-8006-9c77-81161f899104"&gt;against GPT-5.2 Thinking&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://huggingface.co/moonshotai/Kimi-K2.5"&gt;Hugging Face repository&lt;/a&gt; is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has &lt;a href="https://twitter.com/awnihannun/status/1943723599971443134"&gt;been demonstrated to work&lt;/a&gt; with previous trillion parameter K2 models.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46775961"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/janky-licenses"&gt;janky-licenses&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/><category term="llm-tool-use"/><category term="ai-agents"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="ai-in-china"/><category term="moonshot"/><category term="parallel-agents"/><category term="kimi"/><category term="janky-licenses"/></entry><entry><title>Kimi K2 Thinking</title><link href="https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/#atom-tag" rel="alternate"/><published>2025-11-06T23:53:06+00:00</published><updated>2025-11-06T23:53:06+00:00</updated><id>https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/moonshotai/Kimi-K2-Thinking"&gt;Kimi K2 Thinking&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Chinese AI lab Moonshot's Kimi K2 established itself as one of the largest open weight models - 1 trillion parameters - &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/"&gt;back in July&lt;/a&gt;. They've now released the Thinking version, also a trillion parameters (MoE, 32B active) and also under their custom modified (so &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/#kimi-license"&gt;not quite open source&lt;/a&gt;) MIT license.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This one is only 594GB on Hugging Face - Kimi K2 was 1.03TB - which I think is due to the new INT4 quantization. This makes the model both cheaper and faster to host.&lt;/p&gt;
&lt;p&gt;So far the only people hosting it are Moonshot themselves. I tried it out both via &lt;a href="https://platform.moonshot.ai"&gt;their own API&lt;/a&gt; and via &lt;a href="https://openrouter.ai/moonshotai/kimi-k2-thinking/providers"&gt;the OpenRouter proxy to it&lt;/a&gt;, via the &lt;a href="https://github.com/ghostofpokemon/llm-moonshot"&gt;llm-moonshot&lt;/a&gt; plugin (by NickMystic) and my &lt;a href="https://github.com/simonw/llm-openrouter"&gt;llm-openrouter&lt;/a&gt; plugin respectively.&lt;/p&gt;
&lt;p&gt;The buzz around this model so far is very positive. Could this be the first open weight model that's competitive with the latest from OpenAI and Anthropic, especially for long-running agentic tool call sequences?&lt;/p&gt;
&lt;p&gt;Moonshot AI's &lt;a href="https://moonshotai.github.io/Kimi-K2/thinking.html"&gt;self-reported benchmark scores&lt;/a&gt; show K2 Thinking beating the top OpenAI and Anthropic models (GPT-5 and Sonnet 4.5 Thinking) at "Agentic Reasoning" and "Agentic Search" but not quite top for "Coding":&lt;/p&gt;
&lt;p&gt;&lt;img alt="Comparison bar chart showing agentic reasoning, search, and coding benchmark performance scores across three AI systems (K, OpenAI, and AI) on tasks including Humanity's Last Exam (44.9, 41.7, 32.0), BrowseComp (60.2, 54.9, 24.1), Seal-0 (56.3, 51.4, 53.4), SWE-Multilingual (61.1, 55.3, 68.0), SWE-bench Verified (71.3, 74.9, 77.2), and LiveCodeBench V6 (83.1, 87.0, 64.0), with category descriptions including &amp;quot;Expert-level questions across subjects&amp;quot;, &amp;quot;Agentic search &amp;amp; browsing&amp;quot;, &amp;quot;Real-world latest information collection&amp;quot;, &amp;quot;Agentic coding&amp;quot;, and &amp;quot;Competitive programming&amp;quot;." src="https://static.simonwillison.net/static/2025/kimi-k2-thinking-benchmarks.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I ran a couple of pelican tests:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-moonshot
llm keys set moonshot # paste key
llm -m moonshot/kimi-k2-thinking 'Generate an SVG of a pelican riding a bicycle'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="Sonnet 4.5 described this as: Cartoon illustration of a white duck or goose with an orange beak and gray wings riding a bicycle with a red frame and light blue wheels against a light blue background." src="https://static.simonwillison.net/static/2025/k2-thinking.png" /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-openrouter
llm keys set openrouter # paste key
llm -m openrouter/moonshotai/kimi-k2-thinking \
  'Generate an SVG of a pelican riding a bicycle'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="Sonnet 4.5: Minimalist cartoon illustration of a white bird with an orange beak and feet standing on a triangular-framed penny-farthing style bicycle with gray-hubbed wheels and a propeller hat on its head, against a light background with dotted lines and a brown ground line." src="https://static.simonwillison.net/static/2025/k2-thinking-openrouter.png" /&gt;&lt;/p&gt;
&lt;p&gt;Artificial Analysis &lt;a href="https://x.com/ArtificialAnlys/status/1986541785511043536"&gt;said&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Kimi K2 Thinking achieves 93% in 𝜏²-Bench Telecom, an agentic tool use benchmark where the model acts as a customer service agent. This is the highest score we have independently measured. Tool use in long horizon agentic contexts was a strength of Kimi K2 Instruct and it appears this new Thinking variant makes substantial gains&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;CNBC quoted a source who &lt;a href="https://www.cnbc.com/2025/11/06/alibaba-backed-moonshot-releases-new-ai-model-kimi-k2-thinking.html"&gt;provided the training price&lt;/a&gt; for the model:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Kimi K2 Thinking model cost $4.6 million to train, according to a source familiar with the matter. [...] CNBC was unable to independently verify the DeepSeek or Kimi figures.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;MLX developer Awni Hannun &lt;a href="https://x.com/awnihannun/status/1986601104130646266"&gt;got it working&lt;/a&gt; on two 512GB M3 Ultra Mac Studios:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality!&lt;/p&gt;
&lt;p&gt;The model was quantization aware trained (qat) at int4.&lt;/p&gt;
&lt;p&gt;Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://huggingface.co/mlx-community/Kimi-K2-Thinking"&gt;the 658GB mlx-community model&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/artificial-analysis"&gt;artificial-analysis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="mlx"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="openrouter"/><category term="ai-in-china"/><category term="artificial-analysis"/><category term="moonshot"/><category term="kimi"/></entry><entry><title>Quoting Nathan Lambert</title><link href="https://simonwillison.net/2025/Nov/6/nathan-lambert/#atom-tag" rel="alternate"/><published>2025-11-06T21:44:33+00:00</published><updated>2025-11-06T21:44:33+00:00</updated><id>https://simonwillison.net/2025/Nov/6/nathan-lambert/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.interconnects.ai/p/kimi-k2-thinking-what-it-means"&gt;&lt;p&gt;At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, and Kimi are becoming household names. They all have seasons of their best releases and different strengths. The important thing is this’ll be a growing list. A growing share of cutting edge mindshare is shifting to China. I expect some of the likes of Z.ai, Meituan, or Ant Ling to potentially join this list next year. For some of these labs releasing top tier benchmark models, they literally started their foundation model effort after DeepSeek. It took many Chinese companies only 6 months to catch up to the open frontier in ballpark of performance, now the question is if they can offer something in a niche of the frontier that has real demand for users.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.interconnects.ai/p/kimi-k2-thinking-what-it-means"&gt;Nathan Lambert&lt;/a&gt;, 5 Thoughts on Kimi K2 Thinking&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nathan-lambert"&gt;nathan-lambert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-in-china"/><category term="moonshot"/><category term="nathan-lambert"/><category term="kimi"/></entry><entry><title>Kimi-K2-Instruct-0905</title><link href="https://simonwillison.net/2025/Sep/6/kimi-k2-instruct-0905/#atom-tag" rel="alternate"/><published>2025-09-06T16:59:25+00:00</published><updated>2025-09-06T16:59:25+00:00</updated><id>https://simonwillison.net/2025/Sep/6/kimi-k2-instruct-0905/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905"&gt;Kimi-K2-Instruct-0905&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/"&gt;released in July&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This one is an incremental improvement - I've seen it referred to online as "Kimi K-2.1". It scores a little higher on a bunch of popular coding benchmarks, reflecting Moonshot's claim that it "demonstrates significant improvements in performance on public benchmarks and real-world coding agent tasks".&lt;/p&gt;
&lt;p&gt;More importantly the context window size has been increased from 128,000 to 256,000 tokens.&lt;/p&gt;
&lt;p&gt;Like its predecessor this is a &lt;em&gt;big&lt;/em&gt; model - 1 trillion parameters in a mixture-of-experts configuration with 384 experts, 32B activated parameters and 8 selected experts per token.&lt;/p&gt;
&lt;p&gt;I used &lt;a href="https://console.groq.com/playground?model=moonshotai/kimi-k2-instruct-0905"&gt;Groq's playground tool&lt;/a&gt; to try "Generate an SVG of a pelican riding a bicycle" and got &lt;a href="https://gist.github.com/simonw/80f9fc8f888edc43e1f2a5170c95de3d"&gt;this result&lt;/a&gt;, at a very healthy 445 tokens/second taking just under 2 seconds total:&lt;/p&gt;
&lt;p&gt;&lt;img alt="This one is pretty good. Bicycle is mostly the right shape, the pelican is perched quite awkwardly on the saddle, it has quite a good attempt at the chain mechanism." src="https://static.simonwillison.net/static/2025/kimi-0905.png" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/groq"&gt;groq&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="groq"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="ai-in-china"/><category term="moonshot"/><category term="kimi"/></entry><entry><title>Faster inference</title><link href="https://simonwillison.net/2025/Aug/1/faster-inference/#atom-tag" rel="alternate"/><published>2025-08-01T23:28:26+00:00</published><updated>2025-08-01T23:28:26+00:00</updated><id>https://simonwillison.net/2025/Aug/1/faster-inference/#atom-tag</id><summary type="html">
    &lt;p&gt;Two interesting examples of inference speed as a flagship feature of LLM services today.&lt;/p&gt;
&lt;p&gt;First, Cerebras &lt;a href="https://www.cerebras.ai/blog/introducing-cerebras-code"&gt;announced two new monthly plans&lt;/a&gt; for their extremely high speed hosted model service: Cerebras Code Pro ($50/month, 1,000 messages a day) and Cerebras Code Max ($200/month, 5,000/day). The model they are selling here is Qwen's Qwen3-Coder-480B-A35B-Instruct, likely the best available open weights coding model right now and one that was released &lt;a href="https://simonwillison.net/2025/Jul/22/qwen3-coder/"&gt;just ten days ago&lt;/a&gt;. Ten days from model release to third-party subscription service feels like some kind of record.&lt;/p&gt;
&lt;p&gt;Cerebras claim they can serve the model at an astonishing 2,000 tokens per second - four times the speed of Claude Sonnet 4 in &lt;a href="https://x.com/cerebrassystems/status/1951340566077440464"&gt;their demo video&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also today, Moonshot &lt;a href="https://x.com/kimi_moonshot/status/1951168907131355598"&gt;announced&lt;/a&gt; a new hosted version of their trillion parameter Kimi K2 model called &lt;code&gt;kimi-k2-turbo-preview&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;🆕 Say hello to kimi-k2-turbo-preview
Same model. Same context. NOW 4× FASTER.&lt;/p&gt;
&lt;p&gt;⚡️ From 10 tok/s to 40 tok/s.&lt;/p&gt;
&lt;p&gt;💰 Limited-Time Launch Price (50% off until Sept 1)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$0.30 / million input tokens (cache hit)&lt;/li&gt;
&lt;li&gt;$1.20 / million input tokens (cache miss)&lt;/li&gt;
&lt;li&gt;$5.00 / million output tokens&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;👉 Explore more: &lt;a href="https://platform.moonshot.ai"&gt;platform.moonshot.ai&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is twice the price of their regular model for 4x the speed (increasing to 4x the price in September). No details yet on how they achieved the speed-up.&lt;/p&gt;
&lt;p&gt;I am interested to see how much market demand there is for faster performance like this. I've &lt;a href="https://simonwillison.net/2024/Oct/31/cerebras-coder/"&gt;experimented with Cerebras in the past&lt;/a&gt; and found that the speed really does make iterating on code with live previews feel a whole lot more interactive.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cerebras"&gt;cerebras&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="qwen"/><category term="cerebras"/><category term="llm-pricing"/><category term="ai-in-china"/><category term="moonshot"/><category term="kimi"/><category term="llm-performance"/></entry><entry><title>The best available open weight LLMs now come from China</title><link href="https://simonwillison.net/2025/Jul/30/chinese-models/#atom-tag" rel="alternate"/><published>2025-07-30T16:18:38+00:00</published><updated>2025-07-30T16:18:38+00:00</updated><id>https://simonwillison.net/2025/Jul/30/chinese-models/#atom-tag</id><summary type="html">
    &lt;p&gt;Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs.&lt;/p&gt;
&lt;p&gt;I continue to have a lot of love for Mistral, Gemma and Llama but my feeling is that Qwen, Moonshot and Z.ai have positively &lt;em&gt;smoked them&lt;/em&gt; over the course of July.&lt;/p&gt;
&lt;p&gt;Here's what came out this month, with links to my notes on each one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Moonshot &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/"&gt;Kimi-K2-Instruct&lt;/a&gt; - 11th July, 1 trillion parameters&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/22/qwen3-235b-a22b-instruct-2507/"&gt;Qwen3-235B-A22B-Instruct-2507&lt;/a&gt; - 21st July, 235 billion&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/22/qwen3-coder/"&gt;Qwen3-Coder-480B-A35B-Instruct&lt;/a&gt; - 22nd July, 480 billion&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/25/qwen3-235b-a22b-thinking-2507/"&gt;Qwen3-235B-A22B-Thinking-2507&lt;/a&gt; - 25th July, 235 billion&lt;/li&gt;
&lt;li&gt;Z.ai &lt;a href="https://simonwillison.net/2025/Jul/28/glm-45/"&gt;GLM-4.5 and GLM-4.5 Air&lt;/a&gt; - 28th July, 355 and 106 billion&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/"&gt;Qwen3-30B-A3B-Instruct-2507&lt;/a&gt; - 29th July, 30 billion&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/30/qwen3-30b-a3b-thinking-2507/"&gt;Qwen3-30B-A3B-Thinking-2507&lt;/a&gt; - 30th July, 30 billion&lt;/li&gt;
&lt;li&gt;Qwen &lt;a href="https://simonwillison.net/2025/Jul/31/qwen3-coder-flash/"&gt;Qwen3-Coder-30B-A3B-Instruct&lt;/a&gt; - 31st July, 30 billion (released after I first posted this note)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;small&gt;Notably absent from this list is DeepSeek, but that's only because their last model release was &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528"&gt;DeepSeek-R1-0528&lt;/a&gt; back in April.&lt;/small&gt;&lt;/p&gt;
&lt;p&gt;The only janky license among them is Kimi K2, which uses a non-OSI-compliant modified MIT. Qwen's models are all Apache 2 and Z.ai's are MIT.&lt;/p&gt;
&lt;p&gt;The larger Chinese models all offer their own APIs and are increasingly available from other providers.  I've been able to run versions of the Qwen 30B and GLM-4.5 Air 106B models on my own laptop.&lt;/p&gt;
&lt;p&gt;I can't help but wonder if part of the reason for the delay in release of OpenAI's open weights model comes from a desire to be notably better than this truly impressive lineup of Chinese models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update August 5th 2025&lt;/strong&gt;: The OpenAI open weight models came out and &lt;a href="https://simonwillison.net/2025/Aug/5/gpt-oss/"&gt;they are very impressive&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-oss"&gt;gpt-oss&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/janky-licenses"&gt;janky-licenses&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/glm"&gt;glm&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="qwen"/><category term="ai-in-china"/><category term="gpt-oss"/><category term="moonshot"/><category term="kimi"/><category term="janky-licenses"/><category term="glm"/></entry><entry><title>moonshotai/Kimi-K2-Instruct</title><link href="https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-tag" rel="alternate"/><published>2025-07-11T18:33:54+00:00</published><updated>2025-07-11T18:33:54+00:00</updated><id>https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/moonshotai/Kimi-K2-Instruct"&gt;moonshotai/Kimi-K2-Instruct&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Colossal new open weights model release today from &lt;a href="https://en.wikipedia.org/wiki/Moonshot_AI"&gt;Moonshot AI&lt;/a&gt;, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://tools.simonwillison.net/huggingface-storage"&gt;HuggingFace storage calculator&lt;/a&gt; says the repository is 958.52 GB. It's a mixture-of-experts model with "32 billion activated parameters and 1 trillion total parameters", trained using the Muon optimizer as described in Moonshot's joint paper with UCLA &lt;a href="https://arxiv.org/abs/2502.16982"&gt;Muon is Scalable for LLM Training&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I think this may be the largest ever open weights model? DeepSeek v3 is 671B.&lt;/p&gt;
&lt;p&gt;I created &lt;a href="https://platform.moonshot.ai/console/api-keys"&gt;an API key for Moonshot&lt;/a&gt;, added some dollars and ran a prompt against it using my LLM tool. First I added this to the &lt;a href="https://llm.datasette.io/en/stable/other-models.html#openai-compatible-models"&gt;extra-openai-models.yaml file&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;- model_id: kimi-k2
  model_name: kimi-k2-0711-preview
  api_base: https://api.moonshot.ai/v1
  api_key_name: moonshot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I set the API key:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm keys set moonshot
# Paste key here
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And ran a prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m kimi-k2 "Generate an SVG of a pelican riding a bicycle" \
  -o max_tokens 2000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The default max tokens setting was too short.)&lt;/p&gt;
&lt;p&gt;&lt;img alt="Description by Claude Sonnet 4: Simple line drawing of a white rubber duck with orange beak sitting on a red bicycle with spoked wheels" src="https://static.simonwillison.net/static/2025/kimi-k2-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;This is pretty good! The spokes are a nice touch. &lt;a href="https://gist.github.com/simonw/39aba6a1d4895ad7516bffe9485031db"&gt;Full transcript here&lt;/a&gt;.&lt;/p&gt;
&lt;p id="kimi-license"&gt;This one is open weights but not open source: they're using a &lt;a href="https://github.com/moonshotai/Kimi-K2/blob/main/LICENSE"&gt;modified MIT license&lt;/a&gt; with this non-OSI-compliant section tagged on at the end:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Our only modification part is that, if the Software (or any derivative works
thereof) is used for any of your commercial products or services that have
more than 100 million monthly active users, or more than 20 million US dollars
(or equivalent in other currencies) in monthly revenue, you shall prominently
display "Kimi K2" on the user interface of such product or service.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: MLX developer &lt;a href="https://x.com/awnihannun/status/1943723599971443134"&gt;Awni Hannun reports&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The new Kimi K2 1T model (4-bit quant) runs on 2 512GB M3 Ultras with mlx-lm and mx.distributed.&lt;/p&gt;
&lt;p&gt;1 trillion params, at a speed that's actually quite usable&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44533403"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="mlx"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="ai-in-china"/><category term="moonshot"/><category term="kimi"/></entry></feed>