3 Comments
User's avatar
Zen Equity's avatar

Benchmarks are benchmarks. What we really need right now is an evaluation of GPT 5.4, Opus 4.6 and Sonnet 4.6 doing the same array of real-world tasks so we can see the difference :)

Mathivanan's avatar

Couldn't agree more

Mathivanan's avatar

Hey Kilo Team, This is just amazzzzzing, I always wait for the Kilo benchmarks they are the best, Please continue the same, especially for the Frontier models