China

Moonshot AI just open-sourced K2.6, and the benchmarks are hard to ignore. It beats or matches GPT-5.4, Opus 4.6, and Gemini 3.1 Pro on Humanity’s Last Exam with tools and SWE-Bench Pro, which are two of the more credible tests for reasoning and coding. It can run for 12 hours straight across 4,000 tool calls. One internal agent apparently ran autonomously for five days. And it can spin up 300 parallel sub-agents at the same time. ...