We Gave Claude Opus 4.7 and Kimi K2.6 the…

Apr 22

Kimi K2.6 launched on April 20, 2026, four days after Anthropic released Claude Opus 4.7. We gave both models the same spec for FlowGraph, a persistent workflow orchestration API with DAG validation, atomic worker claims, lease expiry recovery, pause/resume/cancel, and SSE event streaming. Then we reviewed the code and reproduced the edge cases the models’ own tests did not cover.

Read →

6 Comments

Mushegh Gevorgyan

Apr 23

Great writeup. Any chance you could share the SPEC.md? Would love to reproduce on a few other models.

richardstevenhack

Apr 22

I use Kimi 2.5 on Nvidia's Developer program - which means FREE API access.

I assume 2.6 will be available there at some point.

Something to keep in mind.

Manuel Gollner

Apr 23

Like Mushegh said - if you could share your test environment with your SPEC.md that would be indeed very helpful for setting up test ourselve and understanding your benchmarks better. Also to contirbute to this great articel! What realy annoys me ist the happy halluzination part of the "stupid" models. They always claim to be finished and "Yeah! Tested everything - looks great!" additude but if you look closely they messed up. I hardly find project where state-machine correctness does not matter. So I keep on with opus thinking high...and pay a lot... A test with Opus 4.6 and the new Kimi oder Opus 4.5 would have been interessting, too. To see how these models evolve and how fare the chinese models are behind the leaderboard in "months" of realease. I personaly don't trust these benchmarks. What does it help me to see how they scored on humans last exam if they fail big time on my easy coding tasks :-D

A friend set up a similar environment for testing - you can check it out at https://GitHub.com/jannismain/ccbench

So thank you very much for this test!

KrisFromFuture

Apr 22

vs GLM 5.1 and MiniMax M2.7 please

Zen Equity

Apr 22

It's nice to see the open weight models catching up with top tier proprietary so quickly

Ken Lyle

Apr 23

So, as a concrete example, I have a non-critical lask to review my docs against my app, fill the doc gaps, and build the links in the app so that each screen and important fields link to the right elements or query in my MCP documentation system. I am thinking Kimi (or Elephant) over Claude to get a nearly or free scaffold without paying Claude to compact the Convo 7 times?

Kilo Blog

We Gave Claude Opus 4.7 and Kimi K2.6 the…