Dwarkesh Patel: why continual learning—not raw intelligence—is the real bottleneck to AGI

Jul 7, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

ruling explaining how they could have won if they l did literally anything different.

Um, but this is going back and forth on whether or not it is fair use to train an LLM on proprietary data, on copyrighted data, and it's looking more and more industrial complex has been on a general generational run of LS in the court system. Indeed.

Uh it's hard to while I think that a lot of these uh the judges and uh have generally been getting it right, it's hard to it's hard to really cheer here because I care a lot about authors that work hard to produce their works. And I can understand where the frustration comes from. Yeah.

But I believe that by and large the model companies have have um been will will be on the right side of history. Yeah. Um on this issue. I'm I'm pretty optimistic.

I think that when we talked to Matthew Prince from Cloudflare, he had an interesting model essentially getting to a Spotify like model where if you publish on the internet and LLMs are using your writing, your original work, your reporting to answer questions to somebody who's paying $200 a month.

Hey, send me a dollar of that and you aggregate that. That seems doable. It also seems very doable that uh the big publishing houses could do deals. We've seen Wall Street Journal and News Corp did a deal with OpenAI.

Now, when I go and ask ChatGpt about something in the Wall Street Journal, it can jump the payw wall, but they're getting a cut. And so, you could see that happening with Audible. You could see that happening with, you know, Apple books, Google Books. They they have everyone's information.

they could flow a little bit of the rev share back and that could actually be a reasonable economic model. So, I'm not super I'm not super worried. I'm still cautiously optimistic that that works out. Anyway, those are our headlines. Let's tell you about RAMP. Time is money. Save both.

Easy to use corporate cards, bill payments, accounting, and a whole lot more all in one place. Go to ramp. com to get started. And we have our first guest of the show, Darcesh Patel in the studio. How are you doing, Dorcash? What's going on? The soundboard's a little loud. Great to have you back.

Um, uh, I we're not getting audio right now. Can we check on that? I don't know if you're on mute on your side, but loved the piece. Listened to it last night. Uh, really appreciate you dropping it in the uh, podcast feed as well. Do we have you? I can hear me now. Yeah. Fantastic. There we go. AGI is here.

We can do we can do a Zoom call. I'm just getting used to this podcasting thing. Yeah, first time. Um anyway, uh really enjoy the piece. Um wait, wait, we have to we have to call out um T Tyler Cowan was on our show a couple months ago.

Really aggressive kind of just like basically was calling 03 and wasn't able to get his video on at the time. So we and it was this funny contrast.

It reminded me of you talking about you're trying to build with a lot of these tools and in the process of building with them you realize like okay this is amazing but it's actually just going to take a little bit longer than maybe we would all like. That's right.

So yeah but by the way I think there's something really interesting. Um Tyler and I disagree on two things and they're both related in a way. So Tyler, you know when 03 came out, Tyler wrote this blog post on Marginal Revolution where he said AGI is here guys. It's really AGI.

Um but then he also believes that look the impact of AI is not going to be that big once we do get you know AGI is going to result in 0.

5% more economic growth a year the kind of impact we saw from the internet right and so I think these two are actually quite related beliefs where I'm like these LLMs they're not that useful this is not AGI you know the AGI will come later and I'm like when the AGI hits we're going to see like 20% economic growth as a minimum um but because he's like this is AGI I'd be like if I thought this was AGI I'd also be like this is not This is not this is not it, you know.

This is not going to lead to big growth outcomes. Yeah. Yeah. How are you thinking about like just definitions of AGI?

And I'd love to I'd love to actually get your a little bit of the history before this piece, your journey, because for me, um, you know, I grew up watching sci-fi and was like, yeah, C3PO will be around eventually, but it's very abstract and I don't have timelines for that.

And then eventually, you know, you start reading uh, you know, what's your P3PO? Yeah. Uh, yeah. You eventually start seeing GPT3, GPT 3. 5, Da Vinci Chat GPT, and it starts feeling like, okay, we've passed the touring test. We need to really have this conversation about AI.

And then, and then Poom and and AGI becomes like the main discourse for like few years. Um, but it felt like this piece, even though, you know, you and Dylan were going back and forth being like, "No, this is still like incredibly bullish for like the general population. " Yes.

So, walk me through like uh where did you start? Where was the Nadier of your timelines? Like when was your timeline like it's happening next week, next year? And then and then walk me through how we got here. Yeah. So, um I've got this podcast where I interview people about AI.

Um and I've had on people who have quite aggressive timelines over the last few months.

I've been around people who are like um well you know there's been many people who have written uh pieces about how we're a couple years out right Leopold Ashen Brener EI 2027 recently Scott Alexander and um uh Daniel Cotello had the um EI 20227 forecast where you know we've got the we've got the bots that can just take over within the next few years.

Um, so that's where my head was at as of a couple months ago. And then I recently interviewed these two researchers.

I think you actually had one of them on your podcast, Shelter Douglas and Trenton Bricken from Enthropic about the path forward for RL, which seems to be the pre-training seems to have um been giving us these plateauing returns. We make these models bigger. GPD 4. 5 didn't seem to be all that impressive.

They had to deprecate it. Um, so but the path forward does seem like 03 actually is very impressive. So that was more the result of this RL process.

So maybe now actually even though pre-training doesn't seem to be as powerful as we might have anticipated, this RL is even more powerful and so we should accelerate our timelines. And so that's where my head was at as of a couple months ago.

But then in having that conversation and thinking through, okay, what specific capabilities in terms of actual applications I as a small business owner have or as a podcast producer have will AI be able to do? And thinking about like why is it not able to do these things right now? Um, and what is the key bottleneck?

I realized there's actually no obvious way in which you can either get LLMs to solve these problems for you or there's no key algorithmic. There's no easy like you know prompt injection kind of thing which would help solve these problems. And the key problem I see is this l the models can't do on the job training.

So if you think about a human employee um you might have some and these human employees the good thing about them is that you know you train them for six months or a year and over time they're getting better and better. They're learning about all the context and intricacies of your workflow what you like.

They they'll fail but they'll learn from their failures and they'll interrogate them in this like very organic deliberative way. They'll pick up small efficienc efficiencies and improvements as they practice a task. Um this just doesn't happen with an LLM.

every session you're getting this amnesiac mind that's very smart but it has it's lost all awareness of um how you like things done how your business works and so forth.

Uh yeah, and if you had a just just to put that into context, if you had a incredibly intelligent employee that could not take feedback, you would fire them within about a month, right?

Because no matter how smart you are, like you're not necessarily going to predict every single possible edge case in the work that needs to be done. And then and then when you make a mistake, if you're not able to like sort of update yourself, then then what are we even doing here, right?

Like that that's like learning like learning from mistakes is like kind of high on the list in terms of how to become great at any specific task or initiative 100%.

Um and so then people will say well look maybe the way uh we can they can learn from their mistakes uh Jordy is like you can just tell it in the context hey you [ __ ] up this way last time you were working for me don't do it again but I think the this is at least an order of magnitude less efficient and less um less uh capable than the way humans learn.

So the example I use here is imagine if you were trying to teach a kid to play the saxophone. Um but the way you had to teach this is uh you know a kid comes into the room and they like try to play it cold, right? They've never seen a saxophone before.

They try to play a saxophone and obviously it's not going to sound the first grade the first time. But what you do is then like after they failed, you just send them out of the room.

You call the next kid who's waiting outside of the room in and you say, "Look, here's some notes I wrote down from last time about what the other kid [ __ ] up. Why don't you read that and you try to play Charlie Parker Cole? " it just wouldn't work, right?

This like task and knowledge that you build up through practice is not this like written instruction manual that you can just write out as a system prompt. Yeah.

And so our current solution is to uh RL on saxophone playing specifically for that child and then and then in that in that scenario you're basically getting that kid drilling that.

But my question is like it feels like when we think about that in the abstract it's like oh yeah like work is just like doing emails so let's RL on emails and then it's doing calendar so let's RL on that and so well yeah we'll just chip down at these and like you know book a flight and then you know schedule a call and then do an outbound sales thing but really jobs are not just five things to RL on maybe it's 500 things or thousands of things and so maybe the the shape of those like even if we even if we can define a verifiable reward and drill it, it's just there's so many different random things to do that it's going to take us a long time.

Is that a reasonable uh like philosophy? That that I think is part of it. But I think the bigger problem is not just the width or the the width of the pool, how many different tasks you have to RL on, but it's the depth in the sense that a job doesn't involve doing a thousand different five minute tasks individually.

It's the fact that you're like trying to work on something, but then somebody slash messages you something more urgent, and then you had to decide which one is more important. Um, you're like you had to keep track of this client and what problem they had.

By the way, I'm talking hypothetically about what a job might involve because I've never actually worked a real job. Me neither. Um but um so just like how all these all these things fit together is um we already have these language models that can do like five minute language jobs, right?

And then the question is why can't we just delegate all language work? For example, I have these LLMs. I try to get these to rewrite autogenerator transcripts for me so they're rewritten like a human. I try to get them to just ingest the transcript and suggest clips to tweet out and things like that.

And I haven't been able to automate these things. I don't know if you guys have been able to, but it's just like I still have to do it or I have to get a human to do it. Uh because and is it not because we haven't, you know, you might think about like emails are something we got to get like future data on.

But this language stuff we already have the data on, right? So like why can't we do it now? And the reason is they can do like a five out of 10 job out of the box. Um these are short horizon language in language out task dead center in their repertoire but there's no way to get them to improve.

So over time you can't be like look my tweet that tweet was fire like it went viral and here's why I think it went viral and like kind of learns that and like updates it's like sort of understanding and writes better tweets in the future. Same with transcripts picking up your feedback.

um since there's no way to do that even if you have all these individual tasks like we have all these individual language tasks these models can do but you can't then just be like okay now you're an employee because an employee is actually improving over time and building up context in a way these models are not yeah the big question I've kept bringing up and asking a bunch of different people is where are you getting value from agents and not a lot of people have great answers they'll be like oh well we use we use this or we use that but you don't see a lot of conversation online of people like, "Oh, this SDR is just crushing it for this like AI SDR is crushing it for me or this this other use case is crushing it.

" And you just don't see that at all. And the reason that that's worrying is that when products are truly great or even have the potential to be great or starting to like really work, people just talk about them a lot, right? Like people talk about cursor a lot, right?

People talk about cloud code a lot and there's some individual use cases like coding agents seem to have the most real traction. Deep research I would also call like an agent. I don't know if you would put it in that bucket but it feels but again it's just it's pure Yeah. Again it's not like this like highly agentic.

Yeah. But I don't think of deep research as like an employee in that same sense. It's not like right because you can't be like okay that's great this this thing you put together. Here's how I like to compile my ideas before a podcast. So, you know, you did a great job compiling this like Stalin memo.

Y um I was very curious especially about these uh uh why the great terror happened in this way and keep that in mind when you're doing a future memo like this style. That's not going to happen. It's got the style that it's learned through its RL training for deep research. Um so then again, it just becomes another tool.

It doesn't really it's not it's not you know it doesn't become like an employee for you.

Can you and then the other thing just since since uh your post was inspired you know by your own tinkering some of the stuff that I'm most excited about that we've gotten value specifically from codegen internally is just these internal tools that we totally could have built years ago that are just now really fast to build.

So, we built something for our ad partners that like automatically finds the exact all the different moments that we talk about them in a given show and then just like links it out and it it's basically just a simple database dashboard that they have access to that like historically you could have built but it just would have been like really time inensive and so it's not anything.

It's the the the value is that you can now build it like in a couple days.

Um, and so yeah, I've been trying to separate like all this is happening in the context of you have hundreds of billions of of like enterprise value locked up in these different labs, some of which have developed what look like great businesses, right? Open AI consumer, basically a a new consumer app company.

anthropic with codegen and then there's still like hundreds of billions of value of of like EV out there where it's unclear where the revenue is going to come from and so when timelines extend and AGI isn't happening you know next year or the following year or whatever I start to get generally a little bit worried because that's a lot of EV to kind of maintain for another half a decade or a decade whatever it turns into.

I I'll I'll get a little more bullish and hypy and take the other side of that. uh to the other side of your claim. Um look, I think even if it doesn't happen in the next two to three years, what we're talking about here is such a big deal that AI is definitely not priced in.

Not by the average person, not by the market, by anything.

Um because once you get this thing which actually does function like a genuine white collar employee, not only do you have potentially billions of extra workers, but you have something potentially more powerful, which is that right now a human mind can't be copied, right?

A human mind can't learn from the experience of other minds. Um if we have a model, it can but it's it's really slow. Like you have to basically work with somebody for a decade and then you can work. Yeah. It's mentorship. Yeah. Exactly. Yeah. Yeah.

And in fact it's been a big problem because as our society has built up more knowledge we had to keep people in school and training for longer and longer which reduces their productive years.

Um but with an AI model you could have a scenario where suppose there is a model that's actually capable now of contin learn learning the way humans can learn. Not only would it um so you know it's broadly deployed through the economy. is doing all these different jobs.

Uh the difference is that it is now able to amalgamate it learnings across all its deployments.

So if one of them is an accountant and one of them is a coder and whatever then the model is learning from each of these uh different on the job experiences and then so even if there's no software progress out of that point that algorithms aren't improving just that ability to learn on the job from everything in the economy would functionally produce what looks like a super intelligence right no human will be will have mastered the range of skills and knowledge and knowhow that this model will have.

I have I have two questions. One's kind of maybe bearish, one's bullish.

Um on the question of just is it possible you think to brute force continual learning by just uh doing something on the design of these model side or maybe in the hardware side to just get to a trillion token context window and then just stuffing it with everything.

Can you explain kind of what the state-of-the-art is here? Because you were mentioning in the piece like the cursor rollups, the summary lines and then stuff getting lost in there, but if we get to 100 billion token context window or something, could it actually just remember every single interaction it's had?

M um I am not optimistic about that because we've had since 2018 we've been um we've had the transformer or alterations on the transformer as being the most performant um models uh and you know who knows what the labs are doing but we do have open source research from companies like Deepseek which does seem to be at the frontier or close to the frontier and while people have found modifications to the transformer which make the constant time overhead of attention to reduce the constant time overhead on attention to like you find these little hacks mixture of experts or laten attention.

Nobody has gotten around the inherent quadratic nature of attention and basically this means that the um the cost of the additional token increases super linearly to just that additional token.

So um right now we have models that have a million tokens or 2 million tokens of context but getting it to 4 million tokens is more than twice as much compute. Got it. Uh significantly more than that and then just taking it to like a billion.

It just uh given the fact that this hasn't nothing about this has changed over the last whatever six years. I'm just like not optimistic that somebody will figure out a hack uh that will change it immediately.

Then on the on the side of like how do you think about continual learning in domains where time is I I I keep going back to this idea that like even if we create the ultimate super intelligence like it probably will have to obey the laws of physics won't be able to time travel or teleport and so there's a lot of uh restrictions on that like at a certain point you just need to move the sand into the chip fab and and and there's a certain amount of energy and time that it takes to do that.

Another example would be like longevity research. Some of that you just need to sit around and wait for a human to age. And so your RL cycles, if you're, you know, trying to learn about how humans age, it is very hard.

Yeah, you can like simulate the human or whatever, but like for the real test, you have to wait decades to see the effect of a certain diet on how long people live. And so it feels like whether it feels like there's a lot of scenarios where the where you can't fully do it simulated.

And so you wind up with these really long times to actually do a roll out essentially. And yeah, and you wind up with something where the the like the time to actually get a new data point or new training data is like you know a thousand times longer than what we've been doing previously.

And so we're in this like uh this like you know data desert basically. Yeah. I think this will definitely be true of many domains especially those involving the physical world. Um I guess as I've learned slightly more about some of these physical domains it's been surprising to me how much can be done in simulation.

Um within bio for example obviously we alpha fold and um I guess now alpha genome but even uh one of the key advances in bio over the last couple of decades has been techniques of multipplex experiments just running millions of experiments in parallels getting data points from that past using AI to learn from millions of seemingly um experiments in different fields about like what that might imply for the human body or for human proteins.

So I am like optimistic. Another thing to keep in mind is that right now you know a corporation might have 100,000 employees but how much is learning from any single employee is very limited. People are just go in do their jobs and that's that.

Um in the future if you do have this economy of agents and it's much easier for AIs to supervise each other to um be observing every single thing that's happening in the organization the the speed of learning might be exponentially faster. uh than what's possible with humans.

Um I agree this is like not around the corner, but the sort of singletarian futures with crazy cyborg organizations that are moving super fast and coming up with new technologies doesn't sound crazy to me. Interesting.

What do you think the recent like last week was dominated by the talent wars and the uh and and the huge uh AI researcher offers at Meta? What do you think that reveals about Mark Zuckerberg's AGI timelines? By the way, I love I loved all the memes, the traded memes.

Honestly, you guys should lean into that because like genuinely this is not even a meme. This like genuine um the market captured move by billions of dollars based on these posts you guys are doing. Totally. Uh yeah, it was crazy seeing like like 10 million views on a an AI researcher getting traded.

Like it it's niche, but it's not really that niche anymore, right? It's Yeah, but I don't think I don't think you could have imag You certainly couldn't have fully imagined that five years ago. No way. Yeah, it's it's important stuff. I mean, I still think they're like underpaying them.

I think like Meta is the first company that is actually coming close to the break even point of um what the what the best AI researchers actually worth the company.

If you're Meta and you're spending 80 billion dollars on compute over the next couple of years, um if one great researcher can give you a 1% performance uptick on that, they're like so worth the $und00 million pay, you're getting a bargain at $100 million.

So, it's actually interesting to me that Meta is the first company that's like, wait, the the return on investment here is incredible. Let's just do it. And then, okay, are the vibes bad? Maybe. Could they have done the announcements better to have produced better, less mercenary vibes potentially?

But what so there's like some ideal version of what they could have done. But also keep in mind that the likely counterfactual would not have been that amazing, you know, great vibes announcement.

The likely counterfactual would have been what they're currently what they're previously doing, which is just like sleepwalking towards loss. And it's much better to just like [ __ ] it.

let's just let's just send it with a couple billion dollars in recruitment uh offers and like at least now they're on the uh player board uh rather than just like sleepwalking towards Armageddon. Mhm.

In many ways, it's it's interesting how viral these like hundred million dollar number, you know, the hundred million is obviously a big number, but whatever whatever the ranges, people are so normalized to professional athletes being comped tens of millions of dollars a year and just purely looking at these types of moves from an economic impact.

is like signing a star pitcher to a baseball team in one area like like h how you know it's surpris it's surprising it's taken this long and the thing that that we were kind of joking about to put it into context is when you see that Tim Cook makes 74 he made 74.

6 six million in total comp last year and he looks dramatically underpaid, right? He already looked underpaid in the context of like Otani Otani making during trade war was making I think Otani was making somewhere around 70 million a year. So he looked under undercompensated in that context.

And then um yeah, I think I think the the other thing that the other thing that that came to mind uh for me from your piece is I feel like there's been this kind of like toxic idea floating around teapot which is like you have one year to accumulate capital before you're a part of the permanent underclass.

And the takeaway from this, you know, if you're correct and that like things will just great things will naturally take longer, then if you're in teapot now or you're at all in AI or or you know, anywhere of these adjacent spaces, it's like and you're like 30 years old or 35 years old or 40 or you're 20, it's like you're here at the perfect time, right?

It's and and I think it was was it was it Mark Andre who said that he showed up to Silicon Valley and he thought he had like missed the missed it. He missed like the PC so many stories like this and so and so I think it should be like people should be like tremendously excited on a personal level.

Um and and no more of this like dumerism of of of you know you got to like yes you need to move quickly. Yes, you you should be working with the best possible people trying to have the most possible impact.

be as close to the to the real action as possible, but no more of this like dumerism like you better get oh sorry you know you didn't get a $100 million offer this year it's over you know no no 100% I mean there's so many actually it's very funny how often this comes up like um the prince of Persia's game developer he wrote this diary while he was making it um and in the 90s he's like talking about I'm going to become a Hollywood script writer because I think I missed programming I have a CS degree but I miss programming so I'm going to go a Hollywood screenwriter.

Um I remember three years ago when I started the when I was like the podcast early days or two years ago and I moved to SF and I'm like oh GB3 has come out and like all the rapper companies are made now. So uh I'm gonna like I you know like I'm not going to make a rapper.

I mean whatever the podcast worked out it's fine. But uh even then I was like oh I missed AI. So I definitely think in retrospect cuz I'm like look another thing to keep in mind is that cursor only hit product market fit after clot 3.

5 came out and gave these coding abilities there's going to be many other things like cursor which will only be viable products once you have continual learning on board or once you have um computer use that's working on board and these are capabilities which I think are exponentially more valuable economically than the models as they exist right now and with many companies will need to be formed around to complement um it's not going to happen by default.

Right now opens revenue is what 10 billion a year ARR um I mean if it's AGI it should be like trillions ARR right uh so what else what other infrastructure will be built around that the um the cursor equivalents for whatever continual learning enables like definitely the biggest companies have not been formed yet because the capabilities that would make them so valuable are not available yet.

Yeah. Um, in terms of the, I guess, like the Mag 7 CEOs, the major players, um, there seems to be this continuum. On one side, you have the, you know, McKenzieite philosophy of like dollars and cents.

Okay, people want tokens, I can inference them, and maybe it makes sense to hire an AI researcher for $100 million if they can improve your model and bring your model inhouse so that you don't have to pay open AI or enthropic for those tokens.

On the other side, you have someone a little bit more like Elon who see this as an existential threat. It needs to be done the right way. It's very important. It's almost doombased uh philosophy.

Um where do you see the other folks in uh in the Mag 7 or in the in the AI race kind of sitting like uh does the super intelligence team and these big offers move Mark closer to one or the other?

because I was able to kind of justify the llama investment just from hey if they don't do this they're going to be paying billions and billions of dollars to anthropic or open AI just to vend LLMs internally as B2B software because they're going to need this in every little nook and cranny of Instagram for a long time.

Um so so I could justify it in that realm. I could also justify it in the realm of like this is the most important technology in human history. you gotta have a you gotta have a play or a compute efficiency like you laid out.

Um I interviewed Satya, I interviewed Mark and the sense I got from them was that neither of them I mean I feel like Meta's group is called super intelligence but I didn't get a sense from either of them that they're like they believe in super intelligence in the way I mean super intelligence which is the thing that's like building solar factories in the desert and then um launching the probes and so forth.

Um they I mean even something that's much weaker than that is still functionally super intelligent. Like in some ways these models are already super intelligent in some ways but their abilities aren't fully unlocked uh because of the other handicaps they have.

Um, but I think they, you know, like whenever Mark's talked about it publicly, he's talked about, you know, creating better social experiences and making the ad targeting better and VR stuff, right?

So, I think that's also same with Satya, but with um making office a better, you know, co-pilot for Office, which also would be worth hundreds of billions of dollars a year. Yeah. But I think they think about it differently than somebody like um Demis or Daario who are like, "No, no, AGI is the real thing. Yeah.

Do you expect the tension between the app layer and the lab layer to just get crazier and crazier and crazier? It feels like that that will be the story of the next five years is kind of these like symbiotic at times but then adversarial at times you know relationships.

M um I mean uh previous technological you know like '9s 2000s Google Chrome stuff runs on Microsoft but they can have an adversarial relationship.

So it would line up with history but I think like the the bigger issue is just that because I think the full potential of AI requires so much more progress in terms of algorithms.

I just think the app layer companies that are building on top of models that exist today are just upper bounded on how much value they can extract because the models aren't good enough yet to do the things that will make them especially powerful.

So for that reason I'm like it doesn't make sense to me that cursor would be worth a whole sixth or eighth of enthropic. um if you think enthropic has some chance to crack continual learning, right?

Uh so I am more bullish on the foundation layer side than the app layer because I think the app layer will turn over once these capabilities are unlocked whereas like the fundamental research has to be done one way or another. Um as far as whether that means they will fight about it. We'll see.

Yeah, I mean it could it could end up looking like the same dynamic we have now where we have cloud hyperscalers that are worth trillions of dollars and then we have we have valuable businesses that are worth measly 1 billion 5 billion you know and they're still big businesses and and maybe can generate a return but but not uh power lock.

Last question from my side we'll let you go. Uh what has Sarah Payne taught you about artificial intelligence? Um, you know, at some point I asked her because her whole big thing is continental versus maritime powers. Continental powers want to invade and capture territory and maritime powers want to protect free trade.

I was just like, um, what big tech company is like a continental power and what big tech company is like a maritime power? Um, she's not she's not she's not watching TVPN unfortunately, so she's not aware. But actually, this is a question I'll turn around to you.

What what uh who's who's the continental power in the of the big seven and who's the maritime

← Back to story