What We Learned from 3 Million Downloads of Kilo Code

The lessons from building a complete agentic engineering platform, and processing over 40 trillion tokens

May 29, 2026

Three million downloads mostly tells you one thing: the quiet assumptions in your product are now load-bearing.

Kilo Code started as a VS Code extension with a small team, a fork of Roo Code, and a simple bet: open source coding agents would improve faster if developers could inspect them, run them, complain about them, and contribute back. A year later, Kilo has crossed 3M+ Kilo Coders, processed more than 40 trillion tokens, and grown past the editor sidebar into VS Code, JetBrains, CLI, Cloud Agents, Slack, code review, deploy workflows, and Teams.

Those numbers do not map cleanly to unique people. Marketplace counters overlap. Installs are not retention. Tokens are not quality. But the volume is large enough to show which ideas survive contact with real repos, real budgets, and real deadlines.

Kilo’s bet was never one giant agent replacing the engineer. The useful version is smaller and more operational: reviewable workflows, explicit permissions, portable context, and engineers who stay responsible for the merge.

Speed found the problems we needed to fix

The first Kilo post was called “speedrunning open source coding AI”. That was accurate. The team formed in a week, forked Roo Code, which itself came from Cline, and started shipping small fixes immediately: better onboarding, free credits, model defaults, DeepSeek support, fewer setup steps.

The early thesis was practical. If agents were going to make programming feel closer to shaping clay, the tool needed to be easy to try and easy to change. Open source mattered because developers do not trust black boxes with their repos for very long.

Launch week made the cost of that bet obvious. Hacker News and Product Hunt brought attention. Discord filled up. Reviews came in. So did abuse. Free credits attracted tens of thousands of throwaway accounts, infrastructure pressure, billing pressure, and support work. The team spent weeks in merge conflicts while trying to keep up with Roo, patch user problems, and harden the system.

First lesson: distribution generates incidents, not just signups.

If you invite developers into the loop, they will find bugs faster than your team can triage them. They will also find vague pricing, weak abuse controls, missing docs, rough permissions, and every place where the agent does something surprising. That feedback helps only when the product can absorb it.

Be honest about the lineage

Kilo did not start from a blank repo. It came through Cline and Roo, then later rebuilt large parts of the stack around OpenCode. That history is one reason the project moved quickly.

Open source agent development works because ideas travel. Kilo borrowed, shipped, merged, broke things, fixed things, and contributed back. Some features started upstream. Some landed in Kilo first. Some came from community PRs. Some were hard-earned rewrites after scale exposed the weak spots.

This is why “open source” cannot just be a badge on a landing page. Developers ask concrete questions:

Can I inspect what runs against my code?
Can I bring my own model key?
Can my team control which models are allowed?
Can I see usage before a bill surprises me?
Can I keep sensitive work local or in a governed environment?
Can I leave if the product stops fitting how my team works?

Those questions shaped the roadmap as much as any feature request. Kilo stayed model-agnostic, supported hundreds of models through hosted providers and BYOK, launched Teams with centralized billing and controls, released Kilo CLI, made more backend code source-available, and kept pushing the same core agent across editor, terminal, cloud, and review workflows.

The trust comes from the verification path. Users can inspect the mechanism instead of taking the marketing copy on faith.

The product became a workflow, not a pane

For the first few months, the work looked familiar: better model support, prompt caching, MCP, docs grounding, portable settings, autocomplete, workflows, custom commit messages, and Orchestrator Mode.

Each feature solved a local problem. Together they pointed at something bigger. Developers were asking for a way to delegate work without losing the thread.

That is why the product kept expanding:

Orchestrator Mode made multi-step work explicit.
Memory Bank and codebase indexing reduced repeated context setup.
Teams added billing, analytics, model controls, and role-based access.
Kilo CLI brought the same agentic loop into the terminal.
Cloud Agents let tasks run in isolated environments with branches and PRs.
Agent Manager and worktrees made parallel work inspectable instead of chaotic.
Local and cloud code review turned agent output back into structured feedback.

The April 2026 VS Code rebuild made this visible. Kilo moved to a shared core across VS Code, CLI, and Cloud Agents, with parallel tool calls, subagents, worktrees, inline diff review, sessions, and multi-model comparison. It also surfaced memory spikes, rate-limit edge cases, and users who needed rollback paths.

That release taught the mature version of the launch-week lesson: speed needs release discipline. Pre-release channels, public issue loops, weekly stability updates, permission hardening, and clear revert paths are release infrastructure. Fast teams need them if they want to keep trust.

What 40 trillion tokens exposed

Tokens do not ship software. Diffs do.

Still, processing 40T+ tokens is a useful pressure test. At that volume, small workflow problems become expensive. A missing context file becomes repeated tool calls. A vague prompt becomes a larger diff than a human can review. A model default becomes a cost policy. A confusing permission setting becomes an enterprise blocker. A slow review loop becomes the reason a team stops using agents for serious work.

This is why agentic engineering needs different product surfaces than autocomplete.

Autocomplete helps with the next line. Agentic engineering has to support the full loop:

plan -> scope -> run -> verify -> review -> merge

Each step needs a place to happen. Planning needs modes and file-backed handoffs. Scoping needs explicit permissions and task boundaries. Running needs model choice, tool calls, and isolation. Verification needs tests, CI, code review, and sometimes another agent with fresh context. Review needs a diff a human can understand.

When one part is missing, the agent may still produce code. The team just will not trust it enough to merge.

Review became the bottleneck

The screenshot version of agentic engineering is 100 tabs. The workday version is much less dramatic: two or three foreground agents for work you actively steer, plus background agents for scoped tasks that can return a PR, a test result, or a clear failure.

For example, a team might split one feature into reviewable units:

git worktree add ../billing-api -b agent/billing-api
git worktree add ../billing-tests -b agent/billing-tests
git worktree add ../billing-docs -b agent/billing-docs

Then give each agent a narrow job:

Agent 1: Add the billing usage endpoint. Do not change auth middleware.
Agent 2: Add tests for existing billing usage behavior. Do not edit production code.
Agent 3: Update docs for the new endpoint after Agent 1 lands. Link to the generated OpenAPI schema.

That workflow helps because each output is easy to inspect. One diff touches the endpoint. One diff touches tests. One diff touches docs. If the tests fail, the failure is scoped. If the docs agent guesses, the mistake is visible.

This is the core lesson from our own engineers using Kilo every day: task size should be bounded by reviewability. If a human cannot review the output in one sitting, the task was probably too large.

The job changes from “write every line” to “design the loop.” You decide the task boundary, the model, the permissions, the environment, and the verification step. The agent writes code. You decide whether that code should exist.

Teams need controls before they need more autonomy

Individual developers adopt tools when they save time. Teams adopt tools when they can explain the risk.

That means agentic engineering needs controls that feel boring until you need them:

model allowlists and provider settings
BYOK for teams that already have model contracts
usage analytics before finance asks for them
permission prompts that can block risky tool calls
isolated cloud environments for tasks that should not run locally
source visibility for security review
code review agents that leave line-level comments instead of vague summaries

Kilo’s history keeps coming back to this point. The tool has to serve the engineer in flow and the organization around the engineer. If either side loses trust, adoption stalls.

The adoption curve is still early. Three million downloads sounds large until you compare it to the number of developers who still use AI coding tools mostly for autocomplete, one-off questions, or weekend prototypes. The next wave is not about convincing people that models can write code. They know that now. The next wave is about making agent output safe enough, cheap enough, and reviewable enough for normal engineering work.

Where agentic engineering goes next

The next phase is portable, governed, and review-first.

Portable means a session should not die because you moved from VS Code to the terminal, Slack, Cloud Agents, or a code review. The context should travel with the work.

Governed means teams should be able to set model policies, inspect usage, control permissions, and understand what ran against their code. Developers should not have to choose between good tools and company rules.

Review-first means every agent workflow should end in an artifact a human can judge: a diff, a test result, a plan, a PR comment, a deployment preview, a security finding. As agents take on larger tasks, that artifact matters more.

That is the part we are most focused on after 3 million downloads and 40 trillion tokens. Not bigger demos. Better loops.

Kilo started by trying to move fast with the open source community. That part has not changed. What changed is the responsibility that comes with scale. Millions of installs and trillions of tokens mean developers are trusting agents with real repos, real budgets, real deadlines, and real production systems.

The next version of agentic engineering has to respect that. It should make the easy tasks disappear into the background, make the hard tasks easier to plan, and make every result easier to review.

That is how agents become part of engineering instead of a side chat next to it.

Kilo Blog

Discussion about this post

Ready for more?