Nemotron 3 Super is Live in Kilo
NVIDIA's powerful new model is free in Kilo Code for a limited time
NVIDIA has officially launched Nemotron 3 Super (120B, A12B), and it’s already live in Kilo.
To make it even better? It is completely free to use across Kilo—from our VS Code extension to KiloClaw—for a limited time.
Nemotron 3 Super is a new, open, hybrid mixture-of-experts model optimized for NVIDIA Blackwell. If you’ve been hanging around the Kilo community, you already know that the smaller Nemotron models have become massive crowd favorites. They are incredibly snappy, cost-effective, and reliable for everyday scaffolding, code generation, code reviews and security reviews.
But the lingering question has always been: when NVIDIA drops a new heavyweight model, how will it stack up against the established giants like Opus and GPT, or the latest OSS superstars like GLM-5 and Kimi K2.5?
Today the wait is over. The new Nemotron is exponentially better than previous iterations. We were lucky to have early access to Nemotron 3 Super and have a chance to test it across Kilo modes. Here’s a breakdown of why you should care about this release, and how it actually performs when pushed to its limits.
The Industry Benchmarks: Serious Engineering Chops
NVIDIA didn’t just scale up parameters for the fun of it—they built a model designed to tackle complex, multi-step reasoning. In early industry testing, Nemotron 3 Super is putting up heavy benchmark numbers:
SWE-Bench Verified: 60.5
MMLU: 86.01
HumanEval: 79.40
The model also claimed the top spot on Artificial Analysis for efficiency and openness, and 36 on the Artificial Analysis Intelligence Index.
These aren’t just vanity metrics. They indicate a model that can genuinely navigate codebases and resolve real-world software issues autonomously. But at Kilo, we care most about how a model handles our specific environment.
The PinchBench Reality Check: How it Handles KiloClaw
We ran Nemotron 3 Super through PinchBench to evaluate its performance on KiloClaw (our hosted OpenClaw offering). We tested it natively to see if the model will claw, and the results are incredibly promising.
In fact, it’s already the top open-weight model on PinchBench. This means that it will likely become a go-to model for tons of different OpenClaw use cases.
Here are top level results from our 3-run test suite:
The Baseline: The model achieved a highly consistent 3-run average score of 84.7%.
Peak Performance: Its best single run clocked in at 85.6%.
Native is Better: We noticed a distinct performance bump when running the model directly. It scored an 84.7% average natively, compared to 79.4% when run through a proxy that forced reasoning budgets. Letting the model use its default behavior without injected parameters drastically reduces variance.
Where Nemotron 3 Super Shines
If your workflow involves moving files, scaffolding, and data analysis, this model is a powerhouse:
Perfect Automation: It scored a flawless 100% across all 3 runs for Create Project Structure, Search and Replace in Files, Calendar Event Creation, and Stock Price Research.
Data Crunching: It nearly maxed out the CSV & Excel Data Summarization task, scoring 98% in two runs and a perfect 100% in the third, with all statistics matching exactly.
Agentic Navigation: It consistently passes OpenClaw Report Comprehension with a 100% score.
Where can it be improved? We’ll really need to see aggregate data across Kilo users, but the model does seem to have some issues with creative tooling. In one of the tests, it had issues with native AI image generation tools, averaging a low 27% and falling back on Python's Pillow library.
Just Do It
But if you need a relentless, highly capable workhorse to scaffold entire projects, navigate complex filesystems, and accurately summarize multiple docs and spreadsheets, NVIDIA Nemotron 3 Super is an absolute beast.
And it’s currently free in Kilo.
NVIDIA has firmly positioned this 120B parameter model as a top-tier performer. It’s live in your Kilo model picker right now. Go spin it up while it’s free and see what it can build for you.




