Reflection AI's Misha Laskin on Asimov: building a coding agent with 'depth' using RL to reason about entire developer workflows

Jul 16, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

What's going on? Launch day. Launch day. Hey guys, good to see you. Thanks for having me. Thanks for hopping on. I I saw the post going viral. I checked the calendar. We'd already booked you. It's working. Perfect. It's amazing. Uh thank you so much for coming on.

Why don't you kick us off with an introduction on yourself and the company and then I have a ton of questions about how engineers are uh understanding code. Yeah, sounds great. So, um just a quick primer on the company. Um my co-founder Giannis and I were deep mind researchers.

uh before this um ourselves and our team which is mostly from deep mind and also from places like open eye anthropic pioneered a lot of the large language model and reinforcement learning breakthroughs over the last decade and we set out about a year and a half ago to build the company with the mission of um I mean it was with the mission of building super intelligence but we had a pretty let's say almost practical and opinionated view on it that like not to build it in the abstract that you had to really define it as both the research and the product agenda And so we thought about well what would a super intelligence in an organization look like which we think is basically going to be this like omnisient oracle that knows everything about the company and can act on the user's behalf.

And if you want to build towards that we then kind of work backwards of where do you start today? And so today um what we're launching is a product called Asimov which is the best codebased comprehension system in the world.

So, it's like a code research agent um that is sort of the submission oracle um for engineers about their code um and can answer any question that they that they have. And that's kind of our first step. Okay. You're clearly an asthma fan.

Maybe an easy question, but uh is the solution to AI safety just the three laws of robotics? Like if we just bake those into the system prompt, are we good? Uh you know, I think that uh we're actually kind of doing that already today.

like these uh when you write these rubrics for language models to kind of go and evaluate um you know whether something is behaving well or not.

Um and similar to the asimov you know conclusion uh the conclusion here is that this problem is much harder and it's very easy to hack these kinds of rubrics and build systems that are not aligned. Uh so uh I think the answer is you know a no.

Um but uh you know it's kind of a big problem that doesn't have like a very simple I think silver bullet answer. Yeah. Yeah.

I mean, as certainly as I've tussled with the the AI safety narratives of the last few years, I've gone all back and forth from, okay, this is a very uh this is a very crazy doomsday scenario that's fantastic sci-fi to, okay, there are some very concrete problems that need to be addressed as these systems get rolled out, even just from, you know, a time and attention and people being, you know, overly focused or optimizing on user seconds going wrong.

Uh there's so many things. So, it's yeah, it's a fascinating space. When did you actually start the company? And this is your first launch, correct? We started the company um about early 2024. So, a little over a year ago, right? And how was that?

I'm curious the decision- making process to go strike out on your own obviously with a great team uh but as a researcher like deciding whether to stay in an environment that had you know massive scale and resources versus going into a more resource constrained environment.

Obviously there's a lot of capital available for great companies and teams but it's still quite a bit more resource constrained than something like a deep mind.

I think that every frontier lab that's been built um was built when it was kind of had enough resources and had a good team, but was definitely resource constrained relative to the kind of big incumbent.

Um and so I think what's really interesting to myself and to the team that's uh been assembled here is how do you build out these sort of next AlphaGo or right the next GPT?

I think that was like the most interesting time to be at these you know what are now incumbent labs when the break initial breakthrough projects that defined them was just being formed and we think there's an opportunity for kind of the new kind of alpha go to be built which is going to be embodied not in a game of go or you know a you know simulated setting but is going to be embodied in an extremely powerful and useful product and so we kind of think about these two things of like how do we set the research agenda to drive breakthroughs via product uh you know on this mission to super intelligence.

So I think it's really if a researcher is uh uh you know happy kind of entering like a big ship and being a kind of small part of it um which you know there's definitely reason to do that it's fin you know it obviously pays very well um that's one path but if you want to start kind of a new frontier lab and drive a new series of breakthroughs uh I think the best place to do that in are really small focused settings.

Mhm. Um, you mentioned Alph Go. My interpretation of that story, it was I was obviously very impressed that Alph Go beat Lisa Doll. I thought that was incredible and unexpected. But I was more impressed by move 37.

And I think that the reaction to move 37, this uncharacteristic, un difficult to interpret move that seemed like an error, seemed like a blunder, turned out to be important, turned out to be critical to winning that game. Showed a type of creativity and sort of a solution to the spiky intelligence.

It didn't feel just like playing the best game of human go. It seemed like it was playing something different. And so my question is, is that the correct understanding or history of the Alph Go Lisa doll match and that and that story? Um, but then also are we still waiting for a move 37 moment in LLMs.

So move 37 was actually I was a theoretical physicist before and when I saw that I decided to get into AI and my co-founder Giannis was one of the key contributors to Alph Go and was there um in soul when that happened and I think that you're you're exactly right that this was probably I think still one of the most beautiful artifacts to ever be produced in in AI and that we have not actually gotten to the point where we're seeing move 37 sevens um coming out of these language models.

Um agree. Yeah, we're we're seeing we're seeing like they're solving math olympiads, they're solving, you know, coding quizzes, but we're not seeing that level of net new creativity.

Um and that is kind of one of the guiding kind of, you know, things at this company is how do we get to these systems that start showing move 37s in the real world? like beyond kind of math olympiads, beyond games of go, how is it, you know, like that?

I mean, I don't want engineers or like people in uh in enterprises to see a thing that an AI gives them like be perplexed like Lisa Doll was with Move 37, but I do want to strike that same sense of kind of beauty that this thing is discovering new technology in front of us.

So, that is definitely um what we're what we're moving towards. Do you have a reaction to the latest Gw essay about LLM daydreaming?

this concept that maybe if you run LLMs across all sorts of different ideas, pick random words, try and find connections, you can kind of brute force innovation or innovative thoughts because we've seen that LLM seem like insanely high IQ, math olympiate as you as you mentioned, and yet have yet to write a really novel funny joke or come up with a new connection in in between the different sciences.

And it feels like we're in the spiky intelligence moment. A lot of what's what's produced feels kind of like a very much an average of the internet. It's it's sometimes, you know, midwitty in many ways. Um it feels uninspired and and Goran was was coming up with this idea that maybe there's a different solution.

Uh did that resonate with you? Did you read that or did or do you have any other ideas of how that could possibly play out? Well, that essay definitely resonated.

Um though you know as a kind of true reinforcement learning believer um you know we've seen super intelligence arise multiple times now that was the game of Go and Alph Go um uh Dota 5 and Alpha Star these projects from OpenAI and DeepMind um we're getting close to it and I think at that point if they just sunk more computing they would have gotten super intelligent um you know video game players more broadly um and so I think reinforcement learning when it's set up right it never fails.

Now, the challenge is that if you're setting it up to solve math olympiads uh questions, um there's no reason to believe that why that would generalize to actual mathematics. It's kind of like a you know, like a student that's really good at taking tests doesn't mean that you'll make a great mathematician.

And so I I think that without engaging with the real world and real world evaluations, um it's really hard to build super intelligent systems in the real world. So I think this kind of benchmark maxing is um a bit of an ego play.

And I'm much more interested in models that are trained with reinforcement learning for real world stuff and maybe they're a bit worse on the benchmarks, but users really love them. And I think we'll we'll start seeing super intelligence come out of those systems.

I don't I don't think reinforcement learning will fail us. Do you think we need verifiable rewards in physics discoveries or something like that?

like ho how can we RL against something that like I I've been my my Kugan's eval is basic or Kugan's benchmark is basically tell me a joke and see if I laugh and uh it feels extremely hard to eval you you have to you know pay a bunch of humans to sit in a comedy club or they have to pay and then you have to record the voice of the laughing or something I I I don't I don't know how I would RL against something as squishy as um as a joke or or a fundamental new insight in physics.

It feels like it might be intractable, but what is your take? Uh I think the verification problem is kind of the most fundamental problem across all of artificial intelligence. Uh when I was working on Gemini, I was uh leading reward model training and that was basically figuring out the verification questions.

So that is I think the basically biggest bottleneck and so there's kind of some systems have verifiable rewards um others um you kind of have these rubrics uh but I think fundamentally the limitation is like what are you evaluating so in the in the physics thing you might have verifiable and even rubric rewards for physics problem sets physics olympiads but getting that for you know actual physics work like you working very closely with physicists and seeing like what their day-to-day is.

And you know, there's um I don't think there are any companies that are actually doing that because it's it's a bit of a slo and it's unclear if it's even economically viable. But that's the sort of thing that you would need to do.

You would need to make a simulation that is as close as possible to what a theoretical physicist actually does. Um this the challenging thing is that there aren't that many theoretical physicists that you could even work with to really understand them deeply.

So it is kind of a data constraint problem, but I think it's fundamentally an evaluations problem.

Yeah, that that's kind of a hilarious scenario that uh for some of this basic science research like the it might cost like a billion dollars in compute and you cannot underwrite that if you're not coming out the other end with like a patent. So that's like pretty tricky.

What was the uh moment internally for you guys that you felt the product was ready to to to launch?

I think we've been so we believe that coding is this kind of root node problem to super intelligence more broadly just because that's how language models interact with software that's sort of like their hands and legs and then we were trying to figure out well where are we today and why you know like what's preventing us from building super intelligent systems um and the short of it we kind of realized that coding agents right the code generation stuff is starting to work like we have these semi-autonomous systems But they're basically like semi-autonomous systems with amnesia.

Like they, you know, they forget everything. They have no context. And it's sort of like uh if you watch like that Adam Sandler movie 51st dates, it's kind of like that for coding, right? So like every day your coding agent wakes up and knows nothing and has to learn everything from scratch.

And so the fundamental thing that we felt was missing that needs to be solved is um this ability to comprehend very large organizational code bases and the software and kind of um systems around them and build this memory like this contextual core for for agents and I think this will sort of generalize u beyond coding right there's sort of every single discipline in organization um is kind of contextbound so even if we get really smart like generation agents um that doesn't mean that they'll actually be useful.

So I think it was kind of having that insight, building the initial product around it, seeing how it's utilized um on our team um on a daily basis and how our initial customers are starting to use it and seeing just having a lot of confidence that this is the big unsolved problem um that you know that we kind of identified and are seeing kind of good momentum around.

Can you talk to me about what you're excited about on the GTM side? Are we going to be seeing a frustrating frustratingly viral clue style ad from you? Uh are you going to set up channel sales like what we saw at Windsurf? Are you going to go enterprise?

What's the most interesting to you in terms of actually uh getting the product adopted at scale?

So we think kind of most of like most of the problems where super intelligence will be useful like extremely useful and valuable um is going to be within large organizational settings like when I was working at you know deep mind in Google's largest monor right it's like this massive monor repo um it has to be like that's the kind of scale where these systems are most useful now obviously like enterp most enterprises are not at that scale um and so it's very much you know it it is an enterprise product uh Now, in terms of like a a go to market, you you want to go for the enterprises that are early adopters.

And so, one of the things, you know, that we've um really been doing is that uh as opposed to kind of a traditional SAS that might go uh viral with um more consumers, um enterprises don't want their code and like all their proprietary data leaving their cloud.

So, we've built it in such a way that it's just deployed on their cloud resources and are working with kind of the early adopters where that deployment is fairly straightforward. Um, so that's kind of our go to market today. Yeah, that makes a ton of sense. Any other questions, Rudy?

Not for now, but uh, this was great chatting. Thanks for insights and always welcome to come on the show when you see if there's a current thing on the timeline that you have strong feelings about. Shoot us a note. Jump on. Let us know when you see a move 37 moment in anything. That's what I want.

Your official move 37 text as soon as the first time I see a move 37, you'll be one of the first people to know. Please, we'll we'll put up a breaking

← Back to story