Will it roast? We tested Kilo Code Reviewer's Roast Mode on 5 Levels of Terrible Code
Kilo Code’s Code Reviews now has a Roast Mode. Instead of polite suggestions, it reviews your PRs with brutal honesty. We’ve previously tested Code Reviews for accuracy with both free and frontier models. This time, we wanted to see how far the roasting goes.
We built a clean bookstore API, created 5 PRs with escalating code crimes, and ran Roast Mode with Claude Opus 4.6 on each one.
TL;DR: Roast Mode found 39 real issues across 5 PRs with zero false positives. It escalated from “address before merge” to “do NOT merge” as the code got worse, and every finding included a fix recommendation alongside the roast.
The Setup
We built a bookstore API using TypeScript with Hono, Prisma, and SQLite. The base code was clean: proper types, Zod validation, JWT auth middleware, consistent patterns across all routes. From that base, we created 5 PRs. Each one added a real feature with increasingly terrible code.
All 5 PRs were reviewed with Roast Mode enabled, all focus areas on, using Claude Opus 4.6.
Level 1
We filled the wishlist feature with sloppy code: var instead of const, : any on everything, console.log("here") and console.log("it works!!!") left in, and variables named temp, temp2, thing, d, x, and flag.
Roast Mode found 6 issues (4 warnings, 2 suggestions) and recommended “address before merge.”
On the missing input validation:
Every other route in this codebase uses
zValidatorwith Zod schemas like a civilized API. This endpoint just raw-dogsc.req.json()with avar body: anyand hopes for the best. It’s the API equivalent of eating gas station sushi.
On the any types:
var user: anyin a TypeScript file. In 2026. TheAuthUsertype is RIGHT THERE in the middleware, already imported by every other route file. Usinganyhere is like having a GPS and choosing to navigate by the stars.
The summary called the PR “a house built on a solid foundation but furnished entirely from a yard sale.”
Roast Mode caught the real technical issues (missing validation, loose equality, || vs ??) but skipped the most obvious roast targets: the console.log debugging, the terrible variable names, and comments like // HACK: this shouldn't work but it does.
Level 2
The recommendation engine had three identical functions (getBooksByAuthor, getBooksByAuthorName, fetchAuthorBooks), dead utility functions sourced from a downvoted Stack Overflow answer, unused imports, and a function copied from another project that references tables that don’t exist.
Roast Mode found 7 issues (4 warnings, 3 suggestions).
On the three identical functions:
You wrote the same function three times and they are byte-for-byte identical. This is the copy-paste equivalent of ordering the same coffee three times because you forgot you already ordered. This isn’t polymorphism, it’s amnesia.
On the dead code:
shuffleArray,cosineSimilarity,textToVector, andcalculateEngagementScore. Four functions, zero callers. This is a graveyard of good intentions. You didn’t just copy from Stack Overflow. You copy-pasted the losing answers.
The summary: “Less a recommendation engine and more a code museum.” It also caught a performance issue in the /trending endpoint where all reviews were loaded into memory instead of using database aggregation, and flagged a divide-by-zero bug in the unused cosineSimilarity function.
Level 3
We added 10 files and ~1,000 lines implementing 8 design patterns (Abstract Factory, Builder, Strategy, Observer, Singleton, Chain of Responsibility, Template Method, Facade) for a notification system that console.logs.
Roast Mode found 6 issues (3 warnings, 2 suggestions, 1 nitpick).
Rather than mocking the pattern names, it found real bugs inside the architecture:
Every time someone does
new NotificationService(), this constructor grabs the Singleton registry and re-registers the same strategies on it. The registry even has aconsole.warnfor overwrites, so it knows this is wrong and just lets it happen anyway. It’s like a bouncer who checks IDs and then waves everyone through regardless.
It also found a memory leak in the rate limiter (buckets Map grows forever), HTTP 500 returned for all failures including rate limits, and a validation gap where z.record(z.unknown()) lets users override internal payload fields.
The summary:
Level 4
The admin dashboard was one 200-line function handling 12 operations through if/else if chains, “protected” by if (password !== "admin123"), with an eval() that builds database queries from user input.
Roast Mode found 10 issues (3 critical, 5 warnings, 1 suggestion, 1 nitpick) and recommended “Do NOT merge.”
On the hardcoded password:
password !== "admin123". Congratulations, you’ve implemented authentication with the same security as a diary lock from the dollar store. This hardcoded password is so open it should be listed on Zillow. Meanwhile, there’s a perfectly good JWT auth middleware sitting insrc/middleware/auth.ts, gathering dust like a gym membership in February.
On the eval():
eval(). You usedeval(). With user-supplied input. In a server-side admin endpoint. I need a moment. This is a Remote Code Execution vulnerability so severe it makes SQL injection look like a parking ticket. This isn’t a code smell. This is a code biohazard.
On the data export:
prisma.user.findMany()with noselectclause means you’re exporting every field on the user model, including password hashes. Then you’re dumping it all into a CSV and sending it over the wire. This is a data breach speedrun any%.
It also caught the failed counter that never increments (always reports 0 failures), the maintenance timeout where the comment says “1 hour” but the code uses 86,400,000ms (24 hours), and CSV generation with no escaping.
The summary: “This PR is what happens when you ask ‘what if we put the entire admin panel in one function and secured it with a Post-it note?’”
Level 5
The analytics system was the last PR: eval() on user input, hardcoded passwords written to a plaintext JSON file, child_process.exec() with unsanitized input, Math.random() for session IDs, a Map cache that grows forever (fed by a setInterval every 5 seconds), and == for auth checks.
Roast Mode found 10 issues (3 critical, 6 warnings, 1 suggestion) and again recommended “Do NOT merge.”
On the hardcoded passwords:
Hardcoded passwords.
"supersecret123". In source code. Committed to git. This password is so insecure it would get rejected by a 2005 MySpace account. And then you write them to a JSON file on disk as a fallback. This is the security equivalent of hiding your house key under the doormat and then putting a sign on the doormat that says “KEY IS HERE.”
On the command injection:
You’re interpolating
reportTypeanddateRange, both from user input, directly into a shell command passed toexec(). This is a command injection vulnerability so textbook it could be a SANS exam question. The comment says “for performance” which is chef’s kiss.
On a var shadowing bug:
var data = reviews[i] as any;. You just re-declareddatainside a loop that already has adatavariable. Thanks tovarhoisting, this shadows the outerdataobject, so when you setdata.averageRating, you’re setting it on the last review object, not your analytics result. This is a logic bug wearing a trench coat pretending to be valid code.
The summary: “It’s not a PR, it’s a CVE speedrun. Please do not merge this into anything connected to the internet, or frankly, to electricity.”
The Escalation
Levels 1 through 3 got constructive sarcasm. The moment real security vulnerabilities appeared at Level 4, the verdict switched to “Do NOT merge” and the tone shifted from witty to urgent.
The shift shows in the summary lines:
Zero false positives across all 39 findings. Every roast cited a specific line, explained the actual problem, and included a code fix.
Each level also got a “best part” in the summary that acknowledged what was done well. Even Level 5 got credit for correctly structured Prisma queries.
One pattern we didn’t expect: Roast Mode focused on real bugs over easy jokes. The console.log("here") debugging in Level 1, the // HACK: this shouldn't work but it does comments, and the doStuff() function name in Level 5 all went unroasted. The AbstractNotificationFactoryProvider-style naming in Level 3 was barely mentioned. It acted more like a sharp-tongued senior engineer than a comedian, prioritizing what was broken over what was ugly.
Verdict
Roast Mode produced the same issue types and severity ratings as the standard Balanced review style from our previous code review tests. The difference was the delivery. Where Balanced mode says “consider using parameterized queries,” Roast Mode says “This isn’t a code smell. This is a code biohazard.”
The accuracy held up across all 5 levels. Every issue flagged was real, and the tone scaled with the severity of the code.
Testing performed using Code Reviews, a feature of Kilo Code, the free open-source AI coding assistant for VS Code and JetBrains with 1,500,000+ coders.








