The AI Coding Revolution Hasn't Started Yet
What I learned talking to hundreds of engineers at Open Source Summit
I just wrapped up Open Source Summit North America in Minneapolis. Three days of keynotes, hallway conversations, and booth demos. What surprised me most had nothing to do with announcements or launches.
Most engineers haven’t adopted AI coding tools yet — not even close.
The Gap Is Enormous
I talked a lot at the conference about moving from vibe coding to agentic engineering — how AI coding tools are changing the way teams organize, ship, and think about development. The audience was engaged. Lots of nodding heads. But the questions afterward told a different story.
“So wait, these agents can actually run code?”
“How is this different from Copilot autocomplete?”
“We haven’t tried anything beyond tab-completion at our org.”
These aren’t junior developers at startups. These are staff engineers, architects, team leads at companies running critical infrastructure. People who build the open source ecosystem. And most of them are still just dabbling. Maybe they’ve heard of Claude Code or GitHub Copilot, but that’s about it.
We’re Living in a Bubble (not the one you think)
I spend my days at Kilo surrounded by people who run multiple AI agents in parallel, who delegate backlog items to autonomous coding sessions, who measure their human-agent ratio. I wrote a few months ago about engineers at Kilo saying things like “Oh yeah, I have an agent looking at that currently” in standup — and nobody blinking.
That’s still unusual across the industry.
The gap between what’s possible with AI coding tools today and what most professional developers are actually doing is staggering. Think 2007 — iPhones exist, but everyone around you still has a flip phone. The technology exists. The people who’ve adopted it ship faster, take on more ambitious projects, and operate with smaller teams. But most of the industry hasn’t crossed that threshold.
Reasons for the Lag
A few things I heard repeatedly in Minneapolis:
“We can’t get approval to use AI tools on our codebase.” Enterprise security concerns are real, and most AI coding tools don’t address them well. When your code contains trade secrets or regulated data, “just pipe it to Claude” isn’t an acceptable answer.
“We tried Copilot and it wasn’t that impressive.” Tab-completion barely scratches the surface of what’s available now. But if that’s your only experience with AI coding, you’d reasonably conclude the hype is overblown. The jump from autocomplete to autonomous agents is massive — and most people haven’t seen it.
“I don’t know where to start.” The tooling landscape changes weekly. New models, new products, new workflows. For someone who hasn’t been tracking this closely, the onboarding curve feels steep.
“My team doesn’t believe it works.” Skepticism from people who tried early tools and got burned is completely rational. The tools from 18 months ago were unreliable. The tools today are different. But you have to experience it to believe it.
The Adoption Gap Is the Real Story
I keep seeing takes about AI replacing developers or the “death of coding.” That framing misses what’s actually happening. There’s a massive adoption gap, and the teams that close it first gain a real structural advantage.
I wrote about 1-pizza teams back in February — how AI tools are enabling smaller teams to ship what used to require twice the headcount. The Harvard/Wharton study at P&G found individuals with AI performing as well as teams without it. Anthropic’s own engineers report 50% productivity gains and doing work that would have been too expensive in person-hours to justify before.
But those are the early adopters. The majority of the industry — the people I talked to in Minneapolis — haven’t even started capturing that value. They’re not behind because they’re bad at their jobs. They’re behind because the gap between “I’ve heard of these tools” and “I’ve integrated them into my daily workflow” is wider than anyone in the AI bubble appreciates.
What Happens Next
The diffusion curve for AI coding tools is going to be steep once it tips. Unlike previous developer tool shifts (version control, CI/CD, containers), AI coding agents produce immediately visible productivity gains. You don’t need to wait for organizational buy-in or infrastructure changes. One engineer on a team can start using these tools and ship noticeably faster within a week.
The adoption dynamics mirror early cloud — individual engineers can start without waiting for org-wide buy-in. The people who’ve adopted are already operating differently. The rest of the industry is approaching the point where ignoring it stops being a defensible choice.
At Kilo, we’re betting that the tipping point is open, model-agnostic tooling that works where developers already work — in their editors, with their existing workflows, connected to whatever models fit the task. Not another walled garden. Not a proprietary model that locks you into one vendor’s ecosystem. The adoption gap won’t close by adding more lock-in. It’ll close by making the on-ramp as low as possible.
The View From Minneapolis
Walking out of that conference, I felt both excited and impatient. If most of the industry hasn’t adopted yet, the productivity wave ahead of us is enormous.
But every conversation I had confirmed that the technology is ready — awareness, access, and trust are lagging behind. Most developers just haven’t had a reason to believe that yet.
If you’re in the “just dabbling” camp — you’re not late. The revolution is barely underway. But the window for being early is closing fast.



A new Stanford study shows once again that vibe coding has a serious problem...
Coding Agent Interactions From Real Users in the Wild
https://arxiv.org/pdf/2604.20779
Abstract
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild.
The dataset currently contains 6,000 sessions, comprising more than 63,000 user prompts and 355,000 agent tool calls. SWE-chat is a living dataset; our collection pipeline automatically and continually discovers and processes sessions from public repositories. Leveraging SWE-chat, we provide an initial empirical characterization of real-world coding agent usage and failure modes. We find that coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code (“vibe coding”), while in 23%, humans
write all code themselves. Despite rapidly improving capabilities, coding agents remain inefficient in natural settings. Just 44% of all agent-produced code survives into user commits, and agent-written code introduces more security vulnerabilities than code authored by humans. Furthermore, users push back against agent outputs—through corrections, failure reports, and interruptions—in 44% of all turns. By capturing complete interaction traceswith human vs. agent code authorship attribution, SWE-chat provides an
empirical foundation for moving beyond curated benchmarks towards an evidence-based understanding of how AI agents perform in real developer workflows.
They also discovered that:
We identify sessions with a low success rating, revealing cases where agents fail to complete the user requests appropriately (Figure 6). In addition to that, we find that less than half of all agent-produced code survives into user commits (Table 3). Vibe coding is particularly inefficient, consuming roughly 3× more tokens and dollars per committed line than collaborative coding (Figures 7 and 29). Vibe-coded code is also substantially less safe. It introduces roughly 9× more security vulnerabilities per committed line than code that humans write themselves and about 5× more than code they co-author with the agent (Table 4). Agents are working autonomously for longer—the 99.9th-percentile turn duration now exceeds 100 minutes—yet they rarely stop to ask users for clarification (Figure 30). Users compensate by interrupting agents in 5% of turns and by pushing back against agent outputs in 39% of turns, often providing corrections and failure reports (Figure 8)