Interesting. I wonder what the total review cost would be in iteration: round 1: have grok identify bugs, fix them, round 2: run opus over the cleaned code. Would grok review + opus review be more economical than a one shot opus only review?
For the purposes of determining total review cost the fixing expense would be excluded.
Which Grok model is reviewed here?
Grok Build 0.1 was used
Any chance you might make the app used for the review process available so that other models can be tested / compared against the known results?
Not gut person taoby have girty not been in your house since last night but person woman was just 2 years of me
Interesting. I wonder what the total review cost would be in iteration: round 1: have grok identify bugs, fix them, round 2: run opus over the cleaned code. Would grok review + opus review be more economical than a one shot opus only review?
For the purposes of determining total review cost the fixing expense would be excluded.
Including a comparable OpenAI model such as Codex would have been a more interesting comparison rather than just the GPT model.
Very surprised by GPT-5.5. My experience is better than Opus with GPT.