Interview

Reflection AI's Misha Laskin on Asimov: building a coding agent with 'depth' using RL to reason about entire developer workflows

Jul 16, 2025 with Misha Laskin

Key Points

Reflection AI launched Asimov, a codebase comprehension agent designed to solve what co-founder Misha Laskin calls 'amnesia' in current coding agents that lose context between sessions.
Laskin, a former DeepMind researcher, is dismissive of benchmark-maxing and argues reinforcement learning trained on real-world enterprise workflows, not math olympiads, is the credible path to superintelligence.
Reflection AI is deploying enterprise-first with data kept in customer infrastructure, targeting large monorepos where persistent codebase understanding generates the most value.

Reflection AI's Misha Laskin on Asimov: building a coding agent with 'depth' using RL to reason about entire developer workflows

Summary

Reflection AI launched its first product, Asimov, a codebase comprehension and code research agent, on July 16th. The company was founded in early 2024 by Misha Laskin and co-founder Giannis, both former DeepMind researchers, with a team drawn largely from DeepMind, OpenAI, and Anthropic. The founding thesis is that superintelligence must be defined through product, not abstraction, and that coding represents the root-node problem for building it.

The Problem Asimov Is Solving

Laskin's core diagnosis of current coding agents is that they operate with what he calls 'amnesia.' Every session, the agent starts from zero, with no persistent understanding of the codebase or organizational context around it. The analogy he uses is the Adam Sandler film 50 First Dates, a system that forgets everything overnight.

Asimov is built to address this by providing deep, persistent comprehension of large enterprise codebases. The product is positioned not as a code generator but as a 'code research agent,' capable of answering any question an engineer has about their system. Laskin sees this contextual memory layer as a prerequisite before more capable generation agents can be genuinely useful at scale.

“We're building agentic systems with depth — capable of multi-step planning and collaboration — aiming for foundational advances toward superintelligent agents. Asimov is designed to deeply understand and reason about developer workflows using reinforcement learning. A former DeepMind researcher on Gemini and AlphaGo-related teams.”
— Misha Laskin

Go-to-Market

Reflection AI is going enterprise-first and deliberately avoiding consumer virality. The deployment model keeps all data and code within the customer's own cloud infrastructure, a direct response to enterprise resistance to proprietary code leaving their environment. Laskin points to Google's monorepo at DeepMind as the archetype of where these tools generate the most value. The current GTM focus is early-adopter enterprises where that deployment is operationally straightforward.

RL Conviction and Benchmark Skepticism

Laskin is an explicit reinforcement learning believer and is dismissive of benchmark-maxing as an 'ego play.' His argument is that RL applied to math olympiad problems gives no reason to expect generalization to real mathematics, in the same way a strong test-taker is not necessarily a great mathematician. He argues that RL trained against real-world evaluations, even if it scores lower on benchmarks, is the more credible path to superintelligence.

He points to AlphaGo, OpenFive (Dota), and AlphaStar as proof that RL produces super-human capability when set up correctly, and believes the same is achievable in enterprise software contexts.

The Verification Problem

Laskin identifies reward verification as the single most fundamental bottleneck across AI. He led reward model training on Gemini at Google. His view is that building RL systems for domains like theoretical physics is not just a compute problem but an evaluations problem. Creating a simulation that accurately reflects what a working physicist actually does would require deep collaboration with a very small pool of experts, making it economically difficult to underwrite. No company, in his assessment, is currently doing this work.

The Move 37 Benchmark

Laskin and his co-founder Giannis, who was present in Seoul during the AlphaGo versus Lee Sedol match, use Move 37 as their internal benchmark for genuine AI creativity. Current LLMs are not producing that level of net-new insight. Solving math olympiads and coding challenges does not constitute it. Reflection AI's stated research goal is to build systems that surface that kind of unexpected, beautiful discovery within real enterprise and engineering contexts.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Reflection AI raises $2B to build the 'American DeepSeek' — open-weight frontier models trained in the US

Oct 10, 2025

Former DeepMind researchers emerge from stealth with $130M raise at $555M valuation to build autonomous coding agents

Mar 7, 2025

Meta in advanced talks to hire Daniel Gross and Nat Friedman, partially acquiring their $2B+ VC fund

Jun 18, 2025