Benchmarking GPT-5.1 vs Gemini 3.0 vs Opus…

Nov 26, 2025

Three AI giants released their best coding models in the same month:

12 Comments

Nov 26, 2025

Love these comparisons. It helps us decide which agent to use for what. This will save me tons of $. Thank you, keep em coming 😍

Ahmet Sezen

Nov 27, 2025

Thank you for the tests, they help a lot! Do you mind sharing the prompts you used in this article?

Gabriele Tomberli

Nov 26, 2025

Very interesting article, thank you!

It would be nice to add:

- comparison with previous generation models, to understand how much they have improved

- comparison with more cost effective models like GLM, DeepSeek, etc, to understand if the quality justifies the cost

If the prompts are publicly shared somewhere, I can try and share some of the results.

Neal Tibrewala

Nov 26, 2025

Were all of these a single test of each model, or did you run it a few times for each model on each scenario?

Gwenaël Nardin

Nov 26, 2025

Can you share prompts used ?

Seg

Dec 4, 2025

wait what do you mean by "Both GPT-5.1 and Gemini 3.0 hardcoded the JWT secret"? this is serious bad practice that should be penalized

Marina Spricigo Azevedo

Dec 4, 2025

nice, but should have used 5.1-codex max and explicitly show us the reasoning effort

H1D

Dec 4, 2025

Great work! Confirms my gut felling overall. Use opus as a default now.

What would be nice to see:

- prompts and code shared

- multiple runs, not one. Models aren't so deterministic still

- same tests but comparing to other IDEs (Cursor, TRAE...) since LLM "harness" could play bigger role more than the prompt and model itself

Suhrab Khan

Dec 3, 2025

Your summary of the coding benchmarks is sharp and clear. You captured the nuances of each model’s strengths and trade-offs in a way that’s immediately useful for developers. Excellent work highlighting the practical takeaways.

I talk about the latest AI trends and insights. If you’re interested in practical strategies for using AI to optimize coding workflows, model selection, and software development efficiency, check out my Substack. I’m sure you’ll find it very relevant and relatable.