Discussion about this post

User's avatar
RobMarschal's avatar

My practical experience with M2.7 has been extremely disappointing, particularly when it comes to following migration plans.

M2.7 almost constantly ignores the plan and the phases that are supposed to be worked through. Instead of migrating existing elements, it creates dummy UI components and placebo elements — and then complains that the work is "too complex."

In other places, it simply generates TODO comments and then proceeds to ignore them entirely.

It does not make use of the tools provided by Kilo Code and insists on making all changes exclusively via sed — which is completely inadequate for real-world development work.

In terms of actual development and migration tasks, M2.7 is noticeably worse than its predecessor M2.5. On top of that, it outright refuses to continue a migration mid-task, which is clearly visible in the thinking output.

I ran the same tasks through 5.3-Codex, Claude 4.6, and GLM-5 — the results were in a completely different league.

This is absolutely unacceptable for a model marketed for code and development work.

Michał's avatar

Really thoughtful comparison from the perspective of real-world usefulness for agentic coding. It would be even better to have a short provider profile next to each model — for example, data retention, training on prompts, processing region, and a few basic security/compliance details. For many companies, that kind of context matters almost as much as the benchmark result itself.

No posts

Ready for more?