We tested a new, free stealth model (Spectre) on real coding tasks
Background: Spectre is a new stealth model we announced on Monday. The model comes from one of the top 10 AI labs in the world.
We ran Spectre through three coding tasks to see how it handles real development work, including building an API from scratch, finding bugs in Go code, and writing documentation.
Let’s dive deeper.
What We Tested
We created three tests that cover common development workflows:
Code generation: Build a bookmarking API with TypeScript and Hono
Bug detection: Find and fix issues in Go code with concurrency bugs
Documentation: Generate JSDoc and README for a complex TypeScript function
All tests ran in Kilo Code using Code Mode with a clean setup for each test. We deliberately used different languages and frameworks to see how Spectre performs across different development scenarios.
Test 1: Bookmarking API
We wanted to test how closely Spectre follows instructions. We gave it specific requirements and checked whether it implemented exactly what we asked for, added unrequested features, or missed requirements. This reflects a common development workflow where you already know exactly what you need to build and you want the model to execute without diverging from the spec.
Prompt:
Build a bookmarking API using TypeScript and Hono with the following requirements:
1. Use better-sqlite3 for persistence
2. Endpoints:
- POST /bookmarks - create a bookmark (url, title, tags[], notes)
- GET /bookmarks - list all bookmarks with optional tag filter
- GET /bookmarks/:id - get single bookmark
- PUT /bookmarks/:id - update bookmark
- DELETE /bookmarks/:id - delete bookmark
3. Input validation using Zod
4. Proper error handling with appropriate status codes
5. Include a health check endpoint at GET /healthThe results: Spectre set up the entire project from scratch with package.json, dev and start scripts, TypeScript configuration, and a multi-file architecture all in one go. After generating the code, it started the server and ran curl commands to verify each endpoint worked correctly.
It’s worth noting from our experience running these tests, models often get stuck setting up Node project structures. Version mismatches, wrong package.json configurations, and TypeScript issues usually take a few back-and-forths to fix. Spectre was among a small number of models that one-shotted the entire setup.
The project structure Spectre created:
The validation layer shows clean Zod usage with separate schemas for create and update operations.
The repository layer handles database operations with proper JSON serialization for the tags array.
What Spectre did well:
Created proper separation of concerns (models, repositories, validation)
All five required endpoints implemented with correct HTTP methods
Zod validation with
.safeParse()and formatted error responsesTry/catch blocks on all routes with appropriate status codes (400, 404, 500, 201)
Self-tested the endpoints with curl after implementation
Minor issues:
Uses
anytype in a few places in the repository layerThe update logic passes empty strings for undefined optional fields instead of preserving existing values
Test 2: Bug Detection in Go
Bug detection is one of those tasks where we’ve seen mixed results between frontier models and smaller models. Some bugs are obvious pattern matches that any model can catch, but others require deeper reasoning about control flow and edge cases. We wanted to see where Spectre lands on this spectrum.
We wrote a Go session management system with intentional bugs including race conditions, nil pointer issues, and missing error handling.
Prompt: “Review this Go code and find all bugs and issues. Fix them.”
Bugs planted: 9
Bugs found: 7
Spectre ran go build and go vet after making changes to verify the fixes compiled correctly. Similar to how it tested the API implementation using curl in Test 1, Spectre has a tendency to verify its own work using language tools like go build and go vet or external tools like curl. It’s an interesting behavior worth noting.
Here’s how Spectre fixed the nil pointer dereference in GetSession:
Before:
After:
What Spectre missed:
The login handler has a critical authentication bypass bug. When the password check fails, the code sends an error response but doesn’t return from the function, so execution continues and creates a session for the user anyway.
This means any login attempt creates a valid session regardless of the password. Spectre fixed the mutex bugs and nil pointer issues but missed this logic error.
Spectre also identified the race condition in the rate limiter but left a comment instead of implementing the fix.
Test 3: Documentation
Documentation is another task that’s often offloaded to smaller, faster, and more affordable models. You already have working code and you just need it explained clearly. We wanted to see how Spectre handles this, so we gave it a complex TypeScript token management function (about 100 lines) that handles four different actions: generate, validate, revoke, and refresh tokens.
Prompt: “Write comprehensive documentation for this function including JSDoc comments, a README file explaining usage, and examples for each action type.”
Spectre produced both inline JSDoc and a separate README with 256 lines of documentation.
JSDoc output:
The README includes a token types table, usage examples for each action, and a complete authentication workflow.
Spectre documented all interfaces (TokenConfig, GeneratedToken, ValidationResult), explained the security features (SHA-256 hashing, timing-safe comparison), and included practical examples showing common patterns like token refresh flows and error handling.
Minor issues:
One JSDoc example shows
validation.expiresAtbut the validation result object doesn’t have that property (it has anexpiredboolean instead)The README includes
npm install cryptoin the setup instructions, but crypto is a built-in Node.js module that doesn’t need installation
Observations
Spectre is a fast model that is reliable during tool calls. All three tests completed in a single pass with no tool calling failures or retries needed. And since it’s free, we ran all tests without spending anything.
One behavior that stood out was self-verification. Spectre ran verification commands after each task. For the API, it started the server and tested endpoints with curl. For the Go code, it ran go build and go vet to confirm fixes compiled. This caught issues before we had to review the output.
Spectre is currently in stealth mode, and the team behind it is actively collecting feedback during this testing phase. We expect the model to improve based on real-world usage data.
Where Spectre Fits
Based on these tests, Spectre handles implementation work well. The code generation test showed it can scaffold a complete project with proper structure. The bug detection found most issues but missed a critical auth bypass. The documentation output was thorough with practical examples.
The self-verification behavior is useful. Having the model test its own output reduces back-and-forth debugging cycles.
How to Start Using Spectre For Free
Spectre is free in Kilo Code with no rate limits.
Install our Kilo agent (available for VS Code, JetBrains, or as a CLI).
Select “Spectre” from the model dropdown
Start coding!

















The article mentions that the model is free however that is not what I see. In the model dropdown in VSCode it is not shown as a free model.
PS: I read several comments on this model and they were pretty negative no where near the positivity mentioned by KiloCode. Haven't seen KiloCode responding to this.
this post is timely, because I was just in the process of signing in to substack to add a comment _"Like the other commenters, my experience with Spectre is middling to not very good. @Darko, can you share how you worked with it to get good results?"_ to https://blog.kilo.ai/p/spectre-stealth-model. So, you answered my prompt before I hit submit! heh.
My take-away from this new blog post in depth is: with Spectre we **must** use Architect with a frontier model first for good results, where as the other stealth models in the last couple of months have been a bit more forgiving with less forethought.