China's Kimi K2.6 Is Closing the Gap Faster Than Anyone Expected

Tue, 21 Apr 2026 20:01:10 +0100

Moonshot AI just open-sourced K2.6, and the benchmarks are hard to ignore. It beats or matches GPT-5.4, Opus 4.6, and Gemini 3.1 Pro on Humanity’s Last Exam with tools and SWE-Bench Pro, which are two of the more credible tests for reasoning and coding. It can run for 12 hours straight across 4,000 tool calls. One internal agent apparently ran autonomously for five days. And it can spin up 300 parallel sub-agents at the same time.

Dario Amodei said recently that open-source and China are 6 to 12 months behind frontier labs. That may hold for what’s sitting on private servers, but what’s actually being shipped to the public is looking a lot closer than that framing suggests. K2.6 is impressive, and a little unsettling. The pace at which this gap is closing is faster than I expected.

The part that matters most to me practically is the cost angle. I’m not switching my own workflows to K2.6 tomorrow, but for companies that are seriously leaning into AI-augmented teams, inference costs at scale are not trivial. Running frontier models across dozens of agents around the clock adds up fast. What K2.6 signals is that you will increasingly have real options: use the best model for the critical tasks, and route the less demanding work to something cheaper that can still handle it. That kind of tiered approach is going to define how serious teams architect their AI workflows.

why it matters

Dario Amodei just said open-source and China are 6 to 12 months behind frontier labs, and while that may be true of internal releases, public systems are looking a lot closer. Given frustrations over usage rates and the rise of autonomous agents, K2.6 looks like a powerful, cost-effective new option for agentic workflows.

Link to the article

Opensource on ben's blog

China's Kimi K2.6 Is Closing the Gap Faster Than Anyone Expected