Mike Knoop: no single AI model dominates today — we need new ideas to reach AGI

Jun 19, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Mike Knoop

see you. How you doing? Good afternoon. Thank you so much for I'm glad there wasn't a major breaking drama story today. Was actually able to show up. Yes. Yes. You you I I don't know if you watched that show at all, but like I was just sitting here. John was so locked in.

Wanted to just keep doing the show and I'm like messaging him like, "No, we we like actually have to. " That was a terrible day to launch anything new and we like launched something. I saw several startups launch stuff. Uh like regrets to everyone you try to get anything out that day. Rahul actually I remember them.

It was it was Rahul and uh Yeah, that's right. Julius had a launch and uh what what what's the voice cloning company? 11 Labs. I think they launched something. Well, and then and then I think Lulu said something. She was like, "If you have bad news, today would be a good day to drop it.

" And then OpenAI actually flagged like, "Hey, we had this like massive, you know, uh Oh, yeah. This this dust up with the government, right, where the government was like, you have to give us your all we don't want to do this. " Yeah. That was like serious.

Like I mean, that got um we're still looking at actually, you know, end results of that. But that that went really deep into the world.

I feel like much more than um you know kind of maybe even got reported on like every single chat thread I was a part of was basically like hm should I like stop using chat GPT as much um to catch because it feels like anthropic has a similar policy I it seems like Google might have a similar policy like there was that story a year ago about uh a man who was using a Google phone with a Google Fi uh cellular connection and had all of his data stored in Google uh Google Drive and Gmail and he took a picture of his of his child to send to a doctor and it was kind of like a nude photo of the kid to inspect the child for like a physical medical problem and it got flagged as child abuse material by an automatic system and the automatic system basically deplatformed him from everything Google and so he lost his email, his phone number, his all of his drive stuff and it was like a false positive but it was really hard for him to get back through there and So I guess my question is like like it seems bad when we hear the story in isolation but maybe the problem is not the individual company and it's instead like the government policy and this applies to all the different companies but I don't know two two things can be true.

One is that it can be a massive overreach by that court to say you know basically you need to eliminate privacy on your platform. Yeah and you can simultaneously have questions around maybe I should use this product in a different way. Totally.

Um and and it's the the inflammatory nature of it is that people use chat GBT as like a confidant that totally yeah and tell it things that they wouldn't tell anyone.

They wouldn't tell anyone in their life and they're and they're having those conversations and I think that's why struck such a cord because like that's true. Um I I just saw some reporting from uh uh CO this week that like chat minutes per day are up to like 30 minutes a day now in usage.

Um, and like it's not it's like closing the gap with like Instagram, which is just sort of nuts to think that like I mean who would have thought a productivity tool would ever like be on par with like a social like media app, right? In terms of daily usage.

It's so but the interesting thing is it's filling a similar void, you know, it's like it it's delivering uh digital companionship in maybe the way that social media products historically did without any social element to it at all. just like there's one one to one.

It's interesting to think like, you know, we went from like, you know, what your friends are doing is like the most interesting thing to like what the Kardashians are doing is like the most interesting thing to like actually maybe the most interesting thing is like this person that knows everything about you and is always on and always willing to talk and you know, you know, who knows?

Yeah. I think the consumer habits are being formed around stuff today. Yeah. Yeah.

I mean, I I find myself all the time like instead of scrolling YouTube looking for an information like an interesting video essay to explain how I don't know like global shipping lanes work or something like that, just going to chatt and saying, "Hey, like break this down for me and then I can just ask a follow-up and dive exactly to the layer that I want.

" And so yeah, I'm definitely in that camp of using Chat GBT just as like an exploration and entertainment education tool, an infotainment tool much more than Instagram right now, at least for me. Um, but uh enough about that. Uh, what is new in your world?

How should we how should we frame kind of the the current horse race between all the foundation labs? Yeah. Okay. So, um, I'm going to share a link. I don't know if this is something you all can pull up or pull it up. This is a post that we published a little over a week ago.

Um, so you know, I think there's been this really big like what's the frontier right now in sort of AI progress, right?

The the the massive shift in the last six to nine months has been moving from this regime of like scaling up pre-training with more and more labelled data into these like test time test time compute test time adaptation regime. People call these data reasoning models, right?

We're getting these models to think out loud additional data. Um, every major lab now pretty much at this point, uh, I I guess except for Meta H has one of these, uh, systems that we've been able to test and and report results on. And, um, I think there's some really really interesting stuff we're starting to see.

Um, I I think the most notable thing is that like there's not an absolute clear winner across the sort of like landscape right now. There's basically a sort of prito frontier that's emerged.

One of the most important things if if listeners are like listening here I think you should take away is that like any any anybody who gives you a benchmark score on an AI system that is a single number is is is uh is just marketing to you because the reality is now with these AI reasoning systems you have to report score on like a two dimensional object you have to consider cost and efficiency alongside the accuracy and all these different lab providers have come out with different AR reasoning systems that sort of score differently they're trading off cost per accuracy at at different points.

So like if you want just like the absolute highest horse, you know, highest rost horsepower sort of cost and time is no option, 03 high is going to be your like clear winner today for that.

But if you're somebody who's saying like, you know, hey, I want to plug in an air easing system into an existing product I have or I want like faster answers and I'm willing to sacrifice some raw horsepower for generality for like quicker response times, lower lower cost, you might look at something like rock or Gemini 2.

2. 5 thinking.

um there's not like a single like best best answer which which I think is pretty interesting and we haven't seen like this sort of frontier is I think what all the labs are really working to to try and figure out okay how can we get accuracy as high as we can but also we got to try and keep cost as low as we can down in the human efficiency real yeah I've noticed that more recently with uh kind of my default usage in chat GPT uh 40 seems super fast but I always am thinking like oh I should maybe put this in 03 pro but do I want to wait 10 minutes and and I'm making that kind of like economic calculus there.

Even though because I'm on the on the pro plan, it's not an economic cost, not a direct I'm hitting a $5 time for you, right? Like 13 minutes. Yeah, exactly. And so I'm kind of like doing a 40 thing over here and then switching back and forth.

It's very it's very odd paradigm that I we never really had to deal with in computing necessarily before.

I mean, I guess like if you were downloading like the 4K illegal Blu-ray versus the Yeah, I will which of course we never did purely hypothetically, but if you were on a torrent site, there was a time tradeoff between watching a screener.

Um I think this is actually one of the reasons these AR reasoning systems I would assert and I I don't have inside baseball in the data but like from the outside looking I think there are some interesting suspicions that would suggest that these like AR reasoning systems at least today in their current form have like relatively weaker product market fit um compared to the uh like non the non-reasoning based systems right the pure language model based based things interesting that's a huge violation of the the like the deepseek narrative that I felt like was really bubbling up was deepseek came out with like the first just like open access reasoning model like reasoning had been tucked behind the open AAI payw wall and so the prousers were familiar with what reasoning models could do that everyone was very excited about them in tech or in the early adopter crowd um but but DeepSeek when that app came out and you could just install it and instantly see the reasoning chain uh it felt like everyone's like oh everyone's going to be addicted to this forever and this is going to be the new paradigm but it seems like that might not necessarily be happening Uh Jordan, do you have something or I wanted to talk about like spiky intelligence and how that plays into this?

Uh we had this uh someone came on and said like I think it might have been Cholto actually talking about ARGI just saying like hey all the foundation labs kind of have like a truce that we won't reinforcement learn specifically against arc.

Um, I don't know how real that is from your perspective, but it feels like increasingly we might see like very task dependent RL runs kind of chipping away at specific things like like IMO level math is something that clearly like there's a ton of work to be done on, but we don't have as many verifiable rewards for for poetry or comedy writing.

And so that'll be a little bit messier and later down the road maybe. But at the same time, there's probably other verifiable rewards that are just smaller pockets of value here and there that for these little microtasks.

And so I'm wondering if we will ever see um like the marketing language around these models evolve like Grock kind of did this with like we are the anti-woke one, but that was more just in the overall like temperature or the vibe of the model.

But I'm wondering if if there'll be an idea of like this one's really good at math, this one's really good at research, this one's really good at that, or if they're all kind of going down the same path with what they're trying to solve. I I do think you probably are going to see some domain specialization.

I think my guess over the next 12 to 24 months is that you'd see some domain specialization benchmark scores emerge because of how all these labs are starting to do the next evolution of training which is they're using RL environments to generate synthetic CO2 traces doing their sort of model trainings on that data and they're trying to go get it on a lot of different just different domains.

um you know the 03 the original 03 paper you know I think was interesting on the benchmark results where you know on this new sort of coot reasoning system they had relatively high scores on math and coding um but but the the the gap um or should say the step function increase in those scores was much higher than the increase in like legal reasoning which you would sort of maybe intuitively guess that or think suggest or expect that like legal reasoning would probably be one of the best like general domains for if you trained a reasoning model that was really good at math and coding like it should be like and it's a language model that like that would directly transfer into like the legal domain because like okay it's symbolic you know reasoning that's like self-consistent and that wasn't the case so I think that's I suspect that's what we'll see there you know there's obviously the big scale news um the thing that I'm seeing now is there's uh probably like I don't know a handful that I know these new startups that have come come up in the last several months but all getting founded to basically go build RL environments to generate synthetic or semi-ynthetic data uh and like selling them to to sort of the major labs or to the major frontier folks building these nextG systems.

Um I I think we're going to see more of that. I expect that's kind of what what's going to drive a lot of areas. What do you what does the data labeling market look like? Today we are covering Surge AI, which a lot of people weren't familiar with.

I I I'm sure I'd seen it at some point, but I was certainly not familiar with it until we covered it today. What do you think the table labeling market uh looks like in in five years? Do you think that scale was getting out kind of at the perfect time? Uh, you know, I'm curious. I I think the timing was pretty good.

Uh, I mean, like look, the the macro change here is is from a regime where like we're pre or scaling up pre-training.

We want as much text, as much high quality label text as we can get our hands on to to scale these these foundational models into one where we're trying to train process models or, you know, make the foundation models really good for process thinking and coot generation.

M um that that is a complete shift in how you want to generate that data.

You want in our environment that you can create lots and lots of coot traces really very long traces over longrunning tasks as well and you can feed all that right back in and then take advantage of the scaling laws we already know about language models and how they how they work and how the performance increases as you can get more examples of the data there.

So you know my my like I guess like macro bet would be you know sort of the trend is heading down towards like the pre-training scaleable stuff significantly higher on RL environments. Yeah.

So, so when when when you say RL environments, what you're talking about is moving from a paradigm of I go to a data labeling company and they hire a ton of contractors to generate new text or verify the responses or grade the responses from these models to I am now hiring top machine learning engineers, AI scientists and having them design a an environment that the that the rein reinforcement learning can happen like autonomously within the system, right?

Or we are effectively like these new startups that you mentioned, they are taking the massive like hundreds of thousands of contractors like out of the loop for the next runs. Is that correct? Yeah, it's synthetic or semiynthetic in some cases.

Um, like example companies here like mechanized work is one that got started recently that's doing this stuff. Uh, morph I think is doing this stuff. Habitat's another world environment that's sort of doing similar stuff. Sure. Um there's a there's just a lot and it's like very emerging.

All these many of these got founded in the last like couple months. Yeah. Yeah. Um and I think that's a function of the demand and the pull from a lot of the frontier research groups that are wanting this data uh to sort of do their altering stuff.

So So do you do you imagine uh companies like Serge and other players would try to pivot into this if if they're expecting I would expect that founder companies like that would recognize this is growing part of the business and have bets in it if they don't already. Yeah. Yeah.

I I was always wondering like the the whole story of scale was kind of a series of like various booms in training data.

Like the first one was uh was data labeling for autonomous vehicles and it seemed like that grew very very quickly and then the the training paradigm around Whimos kind of shifted away from hey we need more and more train labeled training data to something else and just having the cars on the road and and and generating real world data from that.

Um then there was like the second era of like the the pre-training generating the data for RLHF and the big boom there OpenAI and Meta both big customers throughout that cycle and then there was kind of a question of like what's the third act for all of this and I was wondering if the is it possible that there is a third act but it's just something like humanoid robots or something like that like put a bunch of people in mocap suits and generate a ton of training data for what it means to pick up a soda 25 times in a row.

It'd be a very different like training data product, but at the same time like we have mocap suits and maybe that's relevant or maybe that's ridiculous to think. I don't know. What do you think about that?

I mean, so um you could one definition of intelligence is a information conversion rate ratio from the amount of information you have to an action policy decision. Um there's the intuition here is you can make a perfect decision given a set of data or information that you have.

And oftentimes the right thing to do is go collect new data.

And so once we actually start like peeking out on like intelligence capabilities um you know either plateaued because of research or like plateaued because we've actually got like AI that's close to AGI the limiting factor then becomes like the ability to acquire new new information new data.

And so in synthetic like or on the internet that's going to be a function of like you know in sort of software bits world. Uh and then the one beyond that is going to be well how do you get literally go make contact with like reality the universe. Yeah.

And that's your that's your feedback mechanism to get new information into the system so you can like increase your overall intelligence. Yeah. What do you What do you think?

Uh we had George Hots on the show a few days ago and he was talking about this efficiency problem where like if you took all of the if you took all of the conversations that I had ever had and you transcribe them, it would be like a few megabytes of data and I'm able to generate some level of intelligence, you know, based on that and and have the golden retrievers level.

Yeah. A golden the level of intelligence of a golden retriever. Yet an LLM needs you know effectively like you know terabytes of of data. The sample efficiency is very low.

I mean this is a true statement about just the paradigm of deep learning uh compared to program synthesis is the bet that um that we're making at India. Um India program synthesis is a regime that's much more sample efficient ordinary models that can dist that can generalize out of distribution.

Um, but I think it's a completely statement like it's a very damning statement that like we've got AI today that's trained on some colossus of like all of humanity's knowledge and text right over the last like 5,000 years. It's on the internet and like what have what new ideas have have they produced?

And you know maybe I could point to like Alpha Evolve which I think is very very impressive Frontier AI system you know and it's legitimately finding new knowledge. It's creating new ideas, you know, verifiably.

Um, but but they're very small and they're on the margins of things that we kind of like already have been doing, right? Matrix, multiplications, things like this. They're in kind of in the regime of things that we kind of know about and can define and spec out for these systems.

Whereas, like if I took either of you guys and I gave you somehow the superhuman capability to have like all of humanity's knowledge in your head at the same time, like I think you'd probably be able to produce at least one idea like connect two random, you know, divergent domains are like, "Oh, hey, this kind of looks like this.

Should what? Here's a new thought. " Right? Yeah.

Um totally that still feels like something that uh I think I mean it's very exciting to build towards I think that's what we all want right that is that is like EGI is capable of invention discovery that that will actually increase the rate of like scientific frontier innovation but um we don't have that yet switching gears a little bit 5 years from now do you think the average American will pay uh for an LLM subscription I think that the I think the cost is probably going to go down far enough where that just gets built into the subsidy of whatever the product is and the revenue stream is attached somewhere else.

I haven't thought deeply about it, but that's like my off the top of my head thinking. Yeah. Yeah.

We we we were talking offline this morning about just this dynamic of like the average American will actually churn from HBO Max because at that moment in time like they and it's $20 a month or some whatever the fee is at that moment of time there's not a show that they really love.

So like yes there's a lot of like value there but like they're just like yeah I'm just yeah like I I don't I mean I still know so many people that don't even have subs that are like like my my my wife like catch all the time but she's still on the free plan.

that's like enough to get a lot of value for what for what she I s I sort of suspect that you know yes there going to be there going to be power users and those are going to be the folks who really are doing like amazing powerful things with stuff I suspect the base rate is going to go down enough where you know it's going to be more embedded across like almost all the products experience you have as opposed to being like you know a dedicated thing you're paying a lot of like one-off cash for um now you might buy products you need it like robotics or you know things where intelligence is built into I think you're going product categories that emerge and like people go buy those but like paying the subscription itself I'm not as confident on.

Yeah. The other thing that stood out to me today specifically was MidJourney came out with their new video model. It's very good. It's $10 a month for effectively unlimited prompts. And you comp that to Google's VO3, which is $500 a month, and you're still heavily gated. Yeah.

And it just seems obvious that that uh five years may not be the right timeline for that prediction, by the way. It might be longer than that. Sure. Sure. Yeah. Like there there's so much use case diffusion. Like one of the things we're seeing with Zapier AI which is grow growing quite quite strongly right now.

It's on the exponential growth path for the AI usage and AI AI's apps.

Um I I I've looked at this and I've been wondering is this a function of the technology getting better or use case diffusion and I I've looked at the usage and most of the majority of the usage is still on like four or cheaper or worse models right now with people bringing system like AI putting AI into the middle of automation.

Interesting.

And so I I I'm pretty confident that a lot of this like agent automation right now is actually not being driven as a result of like technology progress from AI AGI but more about the market is just starting to learn finally learn okay here's what we can use it for and can't use it for what it's good at and not good at and it's it's very similar to the adoption curve we saw in the early gazap where once you learn it what the tool can do you carry a tool forward with you in time and then you encounter a new situation or circumstance that you can apply your tool to and so you like we we create use cases um over time so I We're still very very early.

Yeah. Uh you know, but there's um I I'd s I suspect that a lot of the usage increase 30 minutes a day even on chat is a function of just use case diffusion less tech progress. Yeah. Where do you think uh XAI will eventually need to generate a lot of revenue? Uh where do you think it'll come from?

I mean if they make progress towards AGI, it's probably going to be enabling the other services they have around like Elon's ecosystem. Yeah, that would be my guess. Less uh less selling it as a direct product itself and going headtohead.

Um you know on cars, rockets, robotics like there's so many places where I think you would want to use uh and have the like product shape where you can use higher degrees of intelligence. You're not bound by just like you know the fastest consumer experience you could deliver.

Um I suspect that might be actually where most of the value is at least in the near term. Who knows over the long term? like the shape wall proxy.

That was in in many ways my my long-term thesis around the Llama project and the super intelligence team at Meta is that there's just so much work to do at Meta Broadly that's enabled by AI that if you can avoid the long-term open AI bill it like that's probably worth billions and billions of dollars because of where how AI is going to infuse into every single corner of their entire ecosystem and it's all at such massive scale that the cost of of using other vendors might be in the billions.

And so just looking at the savings there might make sense. I don't know. I mean, I think the most important takeaway, I think I shared this last time I was on with you guys, it's still true today, is that we are idea constrained to get to HGI. This is what ARC's V1 data shows. This is what V2's data shows.

V2 is completely unsaturated. We're not even talking about efficiency. Just nothing can do it. And V2 looks very similar to V1. Even on like hyper specific solutions on Kaggle, the ARC 2025 contest, progress has been slower this year than it was last year. Wow.

Um, we are very much still like that the thing I can state most confidently and assert most confidently is that like we need new ideas. There's some major breakthroughs we have not we have not figured out or found yet.

Does it worry you that that could take years and years and years and what happens to one of the reasons I funded the prize last year. I wanted to like correct the market narrative here.

I like I spent a lot of time with students a lot of young researchers and like at the beginning of last year there was a serious vibe of like oh it's all figured out. I'm not going to go do AGI research. I'm just going to go work at the application layer NLM stuff and make a quick buck before AGI gets here. Interesting.

And that is a boy, you know, look, if you want to live in a world of AGI for yourself, for your kids, like I I think you what we should be trying to encourage is to design the like strongest global innovation environment possible.

And that's one where there's a lot of diversity of approach, a lot of different ideas being taken, a lot of sharing. Um, you know, kind of what AI looked like in the 2010 to 2020 era. Yeah. Right. Very open approaches how we got the transformer to GPT1 and GPT2 and so on for for today. Yeah.

Um, you know, I'm optimistic. I think the last six months have have looked a lot better than the previous two, three years. I think the AI industry is maturing actually quite quite a bit on on this front, this topic as well. Um, more being more tolerant and kind of recognizing, okay, we don't have it all figured out.

There's there's more ideas we need. Um, that that's been encouraging and and I think it's seeping down at the low levels too. But yeah, my my sort of broad view is any any capable human who has new ideas to work on AGI, I should like, that's the most important thing you could be doing at this point in time.

That's amazing. Uh, thank you so much for stopping by. This is always a fantas fascinating great catch up guys. Thanks for having me. We'll talk to you soon. Cheers. Bye. Uh next up we