Mar 6

With Anthropic and OpenAI releasing new models at lightning speed, is it time for a new definition of a SOTA model?

3 Comments

Benchmarks are benchmarks. What we really need right now is an evaluation of GPT 5.4, Opus 4.6 and Sonnet 4.6 doing the same array of real-world tasks so we can see the difference :)

Couldn't agree more

Hey Kilo Team, This is just amazzzzzing, I always wait for the Kilo benchmarks they are the best, Please continue the same, especially for the Frontier models

Reply

Share

Kilo Blog

Benchmarking the Benchmarks: New GPT and…