The Age of the Flash Model: Gemini 3.5, StepFun, DeepSeek and the Future of Agentic Engineering

Google is playing both offense and defense as it enters the agentic era

May 20, 2026

Android XR glasses. Universal Cart. A new “Ask YouTube” feature.

At the I/O event today, Google made a staggering number of announcements. But the biggest news was beneath the headlines: Google’s release of a new AI model, Gemini 3.5 Flash, designed for agentic work. We got the new model live in Kilo before I/O 2026 had even ended.

The new model from Google DeepMind is already powering Google Search. But is it affordable and reliable enough for everyday use in your favorite coding tools?

In practice, Gemini 3.5 Flash is around 60% more expensive than comparable models like DeepSeek V4 Flash. But it offers frontier-level performance, reaching an average score of 74.2% on PinchBench in initial runs, similar to Opus 4.6.

Welcome to the age of the Flash model.

In the release, Google focused on benchmarks comparing 3.5 Flashto flagship models from other frontier labs like Anthropic and OpenAI. But the Kilo developer community is already comparing it more closely to budget-friendly alternatives—other flash models released recently that are also designed for always-on agentic engineering.

The release of Gemini 3.5 Flash comes on the heels of powerful, cost-effective Flash model releases from open source labs like StepFun and DeepSeek. This is only the beginning of the agentic era.

Developer’s Delight: A Crowded Flash Ecosystem

Google has fundamentally changed the pitch for its Flash tier. Gemini 3.5 Flash is no longer “cheaper if you just need basic stuff done on a regular basis.” It officially beats Gemini 3.1 Pro on most coding and agentic benchmarks, runs roughly 4x faster than comparable frontier models, and boasts a massive 1M-token context window.

Google declared that the new flash model offers “advanced reasoning at Flash-level latency and scale” and JetBrains has shared that the new model “improves low reasoning coding performance by 10–20% compared to the previous Flash generation.” Our initial tests have confirmed this.

But the competition for the ultimate developer AI has never been fiercer. The battle is officially a race to the bottom in price and a race to the top in context length, and developers have a surplus of incredible options:

Step 3.5 Flash: Quietly dominating the agentic coding space recently and currently free to use in Kilo. StepFun has become the go-to model lab for developers running continuous multi-agent loops due to its reliable tool-calling capabilities. Their open-source Flash release been popular across KiloClaw, cloud agents and our VS code extension, which is why it’s been totally dominating the Kilo leaderboard.
DeepSeek V4 Flash & Pro: DeepSeek continues to break the price-performance math. DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts (MoE) model built for sheer speed and high-volume pipelines. For developers needing heavier lifting, DeepSeek V4 Pro (packing 1.6T total parameters) steps in for deep reasoning and complex, multi-step agentic coding tasks without losing the ecosystem’s cost-efficient edge. As I wrote back in April (seems like a decade ago!), DeepSeek has adapted to the new micro-model way of seeing the world with this release, opting for Pro and Flash releases instead of a wider base model release.
Xiaomi MiMo-V2-Flash: Xiaomi is another Chinese lab that has been setting the Kilo leaderboard on fire lately, with their bigger V2.5 releases well as MiMo-V2-Flash, a specialized 309B MoE model purpose-built for high-throughput inference and loop-based agentic tasks. Using a unique hybrid attention architecture, it is designed specifically for scenarios where a model must continuously write code, execute it, interpret the error, and iterate.

Flash Models are Here to Stay

While tech Twitter argues over these static benchmarks, the real revolution is happening in autonomous workflows. Flash models are the undeniable future of agentic engineering, because of their lower cost, high throughput, and focus on effective tool-calling.

Traditional coding assistance involves single-turn prompts: you ask for a function, the AI writes it. Agentic engineering, however, involves giving an AI an open-ended goal and letting it plan, write code, run tests, debug errors, and iterate in a continuous loop until the job is done (or until it times out…but hopefully until the job is done). Previously, executing these tasks with heavy frontier models was financially impossible for daily use. An agent looping through a massive codebase could burn through hundreds of thousands of tokens in minutes—and that’s not token-maxxing, it’s token-wasting.

These new flash models from Google, DeepSeek and others drastically reducing the cost per million tokens, developers can finally let agents run wild. This drop in API costs is democratizing autonomous software development. It allows solo developers and smaller startups to spin up armies of specialized AI agents—one for writing tests, one for refactoring, one for security audits—at a speed and price point that makes full-scale agentic engineering a reality.

Playing offense in the agentic engineering wars means optimizing for the small things. In my opinion, per-agent tool permissions are one of the coolest things about our latest VS Code extension. Flash models make it possible to keep those agents both within-their-guardrails and affordable.

But What About Defense?

Beyond the Flash tier, which sees Google moving in to the offensive with development to rival other frontier players, Google also used today’s I/O event to address the elephant in the room: Anthropic’s Mythos.

To rival Anthropic’s enterprise testing, Google officially pushed CodeMender further into the enterprise today. Although Google DeepMind announced CodeMender back in 2025, Google used I/O to release it into wider testing and share findings from the Google Threat Intelligence Group (GTIG).

With Google pitching CodeMender as the ultimate active enterprise shield, the AI arms race has officially shifted from generating boilerplate code to autonomous cyber defense.

Google is playing all fronts, and Kilo is also here to help you play both offense and defense too. Use Gemini 3.5 Flash as your daily driver for agentic engineering, and any model you’d like for Code Reviews and Security Reviews.

Stay strong with the strength of Google. Stay nimble with model freedom ;)

Kilo Blog

Discussion about this post

Ready for more?