Pre-training isn't dead: GPT-4.5 followed scaling laws and RL is amplifying, not replacing, larger base models

Apr 22, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Jack Whitaker

context for everyone. Jack is helping us uh with uh our distribution strategy. So, thanks for all the support, Jack. It's great to have you on the show. Probably first of many, I guess. But uh I love the post and I'd love for you to take us through it. Uh what what inspired this?

You kick it off with a couple of the reactions to GPT 4. 5. The one that's popping out to me is from Jack Morris. Says, "So GPT 4. 5 is 10x bigger than 40 and only marginally better at most things. My read could be the beginning of the end for scaling laws. What happened here?

or did we run out of data or do scaling laws not capture model behavior on tasks we really care about? Uh what inspired you to write this post? Yeah, definitely. So, so a lot of me and my friends really like GP4.

5 and we kind of had this high taste tester mentality where we thought it came out really well and we were kind of confused why people were underwhelmed by it. Um but a lot of the reason people were underwhelmed by it were the actual benchmarks. You can look and you can see it did worse than you'd expect. Yeah.

So me and Trevor were like, well, did it do worse than you would naively expect on the benchmarks? Have we actually graphed out a log linear law to this scale and seen what performance we'd expect on a IME on everything else?

You know, um and when we actually did this, we saw that it was about in line with benchmarks, you know, a little bit better, a little bit worse. Um and we thought this was really important conclusion. I think what a lot of people don't realize is that these are log linear laws.

If you double the amount of compute that goes into AI model, you're not going to get double the score on the math benchmark.

You know that this is something that we expect that is as compute gets larger, as buildouts get faster, you know, we're going to get much better models, but not just because you can double model strength with doubling the amount of compute. So what is the what is the implication for for that?

Like uh what model should people be defaulting to in the chat GPT app? I think that's the that's the thing I want to start with is uh should people just trust you and say h you know I don't care what people say online I got to pick 4.

5 from the drop down even if I can't tell I will be getting better results or or is there more nuance there with some of the reinforcement learning that's happening on top and some of the reasoning models that might kind of take things to the next level even if the underlying model is uh weaker and cheaper. Definitely.

Yeah. Well, I think the central claim here is that pre-training will continue to work as we're able to build it out. Not that the pre-trained model is the best right now. Um, I think OpenAI's 03 is the best model we've ever seen. It uses RL on every single type of tool use. It uses RL on chain of thought.

It is quite a small model. Um, and it came out really fast. Is is 03 built on four or 4. 5? I'm kind of confused at this point. Yeah, there's there's no public information on it. Me and Trevor did some estimates based on the token speed that comes out. Um, and it seems like it's quite a small model, smaller than GP4.

Interesting. Bigger than 40 Mini. Okay. Interesting. So, is this something that we're expecting them to uh optimize over time and eventually distill 4. 5 in or maybe just scale up the inference chips to the point where they can run an 03 04 style model on top of 45?

Yeah, as you scale up the inference and as you scale up compute buildouts, you could take something the size of 45 and you can do 03 style training on it. And I think that's going to be really really exceptional model.

I think a lot of the point of the piece was that you have like a lot of axes on which you can improve AI and none of them are obviously showing diminishing returns. So there's just tremendous potential to make models better and there really is no wall. But what about the what about the economics?

At a certain point you you know you need an order of magnitude. This is log linear of course. So, you need an order of magnitude more compute. We're getting into the $500 billion data center. Uh, at a certain point, you get to five trillion and you're talking about a meaningful portion of global GDP.

Um, and if the if the results are, oh, it goes from 128 IQ to 130, the economics don't really pencil out. What's your take on on will we just see a pre-training winter just for purely economic considerations? Yeah, and I I I think that's what we're seeing right now.

Um, but another thing is that even as we have like it would be really expensive to train a model 10 times bigger than GP 4.

5 now and probably like impossible to serve, but in a couple years we're going to see the type of algorithmic efficiency and we're going to see the type of chip improvements that make a model at this scale much more feasible.

When you say algorithmic efficiency, um are you talking about the type of uh optimizations that happened with DeepSeek on kind of memory incon the the FPA that those types of uh inference optimizations at the actual uh inference level or are you talking about the actual design of the algorithm as it's trained or the design of the model?

Yeah, basically on every level. I'm really referencing um Daario's excellent piece about deepseek and exploit controls. Yep.

where he says that um deepse is really good model you know but we do just see these continuous algorithmic improvements as we go you know so sort of the the functional s of compute are also scaling even if your compute isn't scaling uh talk about switching gears for a second what what is the uh what's the vibe on the Stanford campus right now is is is there a risk that everybody drops out to work on AI I imagine I imagine it's like a constant conversation Yeah, I I wish it was honestly talked about a lot more.

There's a lot of people um a lot of my friends around campus like Mojit Agrial and Jacob and Tamaki, you know, who are super aware of this stuff, you know, and and are always kind of pushing things forward and thinking about these things all the time.

But in terms of just like the populace getting used to AI and getting involved in it, it's just been a really slow and continuous crawl upward. You know, um I I'm a CS major. I end up talking to a lot of CS majors and they're not fully internalizing what the model suite is going to look like.

Everyone thinks that the models are going to get better around the same kind of paradigm and level, you know, and no one is ever thinking we have 03 now. What's GPT5 going to look like in a few months? What's GPT 6 going to look like next year? What are you thinking GPT5 will look like?

Are we just talking about um another order of magnitude on pre-training flops or is there more to it than that? Yeah.

So, so my my naive guess is that GPT5 is going to be a model that uses all the clever training techniques that OpenAI and um Anthropic have worked out around RL um but is scaled up significantly from these pretty small models. Sonnet 3. 5 is pretty small. I think the O series is quite small as well.

Um, one thing that I think is the most interesting about 03 is that it has a lot of these agentic properties that we've been seeking so long for so long, but it has them in this kind of narrow sense where it can agentically call tools.

It can agentically browse the internet, but it's not necessarily like agentically going and doing a whole project, you know, and I think that trajectory is one to watch as as you look at GP5 because you're going to see it being more and more agentic just very naturally as context expands and as the model expands. Yeah.

Um, Tyler Cowan said, uh, 03 is AGI. Uh, we had someone on the show yesterday who said, uh, we have 10 minute AGI, uh, but that's not necessarily eight hour AGI or 24-hour AGI. Uh, how do you think about the length of reasoning chains and and and kind of that new, uh, frontier of optimization?

Maybe we've hit the intelligence curve, but we need the agenticness to continue for a long time. Yeah, I I basically think that's right. that that the 10-minute AGI take is exactly right. I mean, the AGI debate hinges a lot on what definition you use.

Obviously, the Microsoft CEO came on Door Cash a few months ago um and said, "I'll only believe it's AGI when the GDP goes up by however much percent, you know.

" Um I think we're starting to see models that are very general and they're very intelligent, you know, and they can do a lot of the things that you might have naively described in AGI to be able to do, but they still don't really have this like full capacity yet.

I think a lot of this stuff just gets worked out as we kind of improve our current techniques though and isn't necessarily like some barrier, some key problem that needs to be solved.

Are uh undergrad CS majors uh bullish or bearish on like rappers or trying to go for the application layer trying to do a startup or or are the best and brightest saying, "Hey, there's no way we're going to win. Let's go work for a lab.

" Yeah, I think a lot of the smartest people I know really want to work for labs, but I think they're almost too bearish on startups and on rappers, you know, and that a smart rapper can take a model and kind of scale up with it, you know, and I think this is what we've seen from places like Cursor, um, you you improve as the model improves.

If you're making a rapper that's a bet on models not getting better, you know, like where you're doing a lot of prompt engineering and and you're really working to create these scaffolding to try and make the model better instead of just saying, "Okay, how do we al how do we get distribution?

How do we get the user experience and then product gets better? " Very light rappers of the last few years, I think a lot of them relied on consumers not being aware of chat GPT's capabilities. I mean, there's some apps in the store that are literally like AI.

Yeah, there's still like billions of dollars of revenue out there that's basically they just happened to acquire a customer before OpenAI did or or another another lab. Yeah. Uh C can you tell us any more about um just the the mood among CS majors around uh the opportunity in startups broadly?

Yeah, I I um I I may write about this pretty soon, but I I I do think the startup culture at Stanford is is really not what it used to be, you know.

Um there there was sort of this like house era at Stanford where it seemed like there was so much energy around startups and now it really feels more like people are doing their startups as summer projects and people aren't committing to them in the way that you really want to see.

Um I think that if if some Stanford student came to me and said, "I want advice on what I should be doing this summer. " I wouldn't tell them to go start a company. I would tell them to go work for RAMP. I would tell them to go work for Cursor.

I would tell them to go see one of these incredible organizations, look at how they work, and then take this knowledge to go start a company. I think too many people are doing these things as like side projects and not fully committing to them.

Yeah, we talked to somebody uh who is referencing the early Facebook days at Harvard where if you were an undergrad at Harvard or Stanford and you were interested in startups, like going to Facebook was the expression of that.

Now, if you're interested in startups, like there's a $2 million seed round just waiting for you and you will be a founder. Even though you could go join a 10 person or even a hundred person startup, get a lot of that startup experience.

So, do you think venture capitalists are to blame here or is it something cultural or should we lay it on the university? Who's to blame? And there's no actor who really made this bad.

But the fundamental issue is that startups became too high prestige too fast and like doing real things and really building things didn't gain that same kind of prestige. This is a lot of actually what I valued most about my time at Doash podcast which is where I worked last summer.

Um was that Docash felt like a startup. You know, we had all of this energy, all of this um attention that a startup had, but at the same time obviously we're we're doing a podcast. You know, we're executing at very high level, but it's a podcast.

And I think getting to work with someone like Doresh who has this like agency and this knowledge and this drive to make things better and make his content incredible, make his questions incredible was something you can learn a lot from, you know?

So it becoming higher prestige to have like stealth startup in your bio than it is to have like I'm like working for Nat Friedman this summer is like pretty lame. Uh yeah, I think that'll shift. I mean working for Nat Friedman's pretty cool. I I I think it will it will it will catch up. Yeah.

benefit is we have a generation of people that are finding out just how hard startups are and there's not p there's not they'll land somewhere. Yeah, they'll land somewhere. But also realizing that if you're going to start a company once you know how hard it is, you really really need to pick Yeah. pick ideas carefully.

Y um totally. Uh can you give me your read on the last two episodes of Dwar? He had AI 2027 and then AI 57 2057 or something. Uh are you AGI pill? Are you feeling the AGI? What's going on over with in your world? Yeah, for sure.

Um I actually I I mean I think the epoch people are fantastic and I also think the AI2027 people are very smart and I I sympathize to both sides here. You know, I think model capabilities will grow very quickly and I think it's much less clear how this will translate into the economy growing really fast.

you know, um I was I was in an interview recently and someone asked me say, "Jack, you keep telling us AI is going to be good. I think AI is already good. You know, why hasn't it changed the economy? " I was like, "I I I don't really know.

" You know, seems like the naive economic model says that diffusion of technologies takes a really long time. And it seems like intuitively that wouldn't be true for software, but practically it seems like it is, you know.

So, I think the AI 2027 people might be mostly right on how fast um things are going to grow and how fast the models are going to get smart.

Um, and then the epoch people might have a really good sense of like, oh, but really getting this into the service sector, getting this to disperse across the economy in a way that changes things fundamentally is is a much longer issue. Yeah. I mean, maybe it's just like a human issue.

You could kind of comp it to I mean, you go back to like the PayPal days, like the internet got fast enough to transfer money very quickly and then the percentage of money that was transferred digitally grew very slowly because it's human behavior.

and and and it was only it was only last year that Stripe became 1% of world PDP transactions, you know, right? And that's Stripe and they're like the power law winner in the category. Yeah, it's crazy. Anyway, Jordy, you got any other questions? Uh, not this second, but uh Jack, it's great to have you back.

Every time you publish, pop back on. Give us the breakdown. This was fantastic. Thanks so much. And uh Jack's coming to LA soon. Oh, fantastic. Looking forward to meeting you in person. We'll see you at the new studio. Great. Thanks for coming on, Jack. Good to see Great to be on, guys. Thanks. See you.

Talk to you soon. Um, should we go through some timeline posts and then get out of here? Matt Wang says, "I find," he's quoting, um, uh, he he posted, quote, "I find that super subject matter experts can sometimes be very bad at predicting the things they're an expert in.

That is because they're overweight, their own expertise. " Interesting. I think we should deep dive this uh, this chat with Domer, the number one trader on Poly Market. interesting.

It's kind of like an anti- it's not anti-wisdom of the crowds because that's polymarket, but there's something there's something there that uh I mean highly relevant to the AI discussion um where everyone has a different prediction.

They all have a diff a different set of experiences and and expertise and then also uh conflicts of interest and all sorts of things. Uh but we should dig into that post. Um Joe Wisenthal shares that markets are surging on this headline. Basant sees deescalation. China situation unsustainable. Uh S&P now up 2. 7%.

I didn't really markets love when a situation is unsustainable. Yeah, we're deescalating but unsustainably. Uh who knows? Anyway, uh we got some big news in the media world. Evan Armstrong has walked away from his cushy writing job at every to launch his own startup today. He's going founder mode. founder mode.

Uh, he says, "The leverage is his big swing. Starting a company while parenting a newborn feels a little insane, but I couldn't keep this idea inside me anymore. It had to exist. " So, congrats to Evan for taking the leap into the arena, out of one arena into the other.

Um, maybe it'll be a public company day to one day. Maybe you'll be able to buy the stock on public. com investing for those that take it seriously. They got multi-asset investing. He's got to be the first Substack to Spack. Yeah, I'd love to substack. Anyway, uh we wanted to give a little congratulations to CC Gong.

She shares a life update 8 months after she went viral for supporting her YC boyfriend. I don't know if you saw that eight months ago. She she posted some photo saying um like I I I my boyfriend's in YC. I cooked him dinner and it went like very viral.

Uh he uh but he proposed this last weekend by bringing uh together a hundred of their friends and family to surprise uh her with a stand-up comedy show that she had to perform in golden retriever. Uh it's fantastic.

She says, "I posted the tweet in just but it sparked a global gender debate about women being invisible emotional labor sidekicks to men's visible professional success. In actuality, my fiance is my secret weapon. " Uh so congratulations to Cece on the engagement. I hope the wedding planning is fun and enjoyable.

And I hope his company uh starts ripping. You know, if it's not ripping already, he's he went through YC. Hopefully, it's just up and to the right from here. Marriage is the greatest investment you'll ever make. It is one of the most important. It is. Um Calvin Kevin says, "Never buying regular Zins again.

5B bucket of horse nicotine from Tractor Supply. Never underperforming or having a foggy mind ever again.

I am the hashtag boss lipping half a horse dose and I feel scroll so that so that people can see it's the next the next slide is is queued up for the mustang horse nicotine packets if you pull up the next slide do you know about this is is nicotine consumption in the horse community big thing wait do you understand the riff on this so so basically someone got horse electrolytes and was like this is horse aid horse Gatorade went mega viral and so now people are spinning off on that this is clearly AI generated I saw this way too quickly.

I saw it and I was like, John is this size of of an maybe I should do this. Uh we got to get Lucy to do some There were other jokes about this. There was horse creatine. People were spinning off on it. There's a variety of of I got uh AGI is here. Yeah, AGI is here. Uh oh, this is big news.

Uh the star defense tech founder Matt Grim was spotted soaking up some sun at his Costa Mesa headquarters. Uh he is of course the uh uh and he's stunned in a perfectly tailored three-piece suit. You can't step out of your office without one of our We have paparazzi everywhere. We're bringing the paparazzi to technology.

And Ben had a good idea. He said, "Need TVPN to start a weekly segment showcating the best tech, business, and finance fits of the week. what league fits does for the NBA. We need for the class work fits. Maybe even throw in a yearly awards segment. Oh, we already did that. We already did that.

We already did the best fitted in tech. Aiden Gomez was the runner up. I believe uh Alex Karp won for his for his general overall style, but also his fantastic protect Philippe Aquinaut with the gold strap. Well, great idea, Ben. Or the orange strap. Uh we already talked about Clue Lee, but there's a lot of backlash.

I thought it was a great interview. I thought I thought it was interesting. Uh it is a little like what's interesting is that I don't find his actual product that dystopian. The idea of having augmented reality glasses that uh I think he finessed the internet for a bunch of attention.

He finessed the internet and I think he's going to leverage it very well. The the most dystopian thing is that he's clearly a talented founder. He can't just say I'm building augmented reality AI app. He has to go and put it in these provocative terms. That's actually the kind of more disturbing thing.

And he he wants to get to, you know, a uh implant effectively a chip for your brain, but he's got to build enterprise SAS to get there. Yes. And uh that's a good lesson for everyone. Well, speaking of a lesson, a SAM lesson said, I think we should let's actually save this for tomorrow. Yeah, let's bring him on.

Go go go go through it properly. Yeah, it'd be great to invite him on and and talk through uh what to do as a seed investor and his TLDDR. Uh we will go to a post from bezel. We already did the ad read, but uh they had a great post. Cardier said brand everything. And they launched a uh a horse headband that's branded.

I love it. Uh camel drapery branded grass. Not yet, but they're working on it. Uh maybe we should do a TBPN horse headband or camel drapery for the folks in the audience who have horses and camels. I think that'd be great. Put some ads on there like the jacket.

We should do a TVPN camel drapery or horse head horse drapery. Yeah, livery for your horse. I love it. Uh, anyway, shout out to Michael in Hawaii. He says, "Doing my duty and TBPN pilling my Hawaii one big screen at a time. Size gong in full effect.

" Thank you, Michael, for putting us up on the largest TV I've ever seen. This must be a projector of some sort. But, uh, I thought this was an awesome post. Thank you so much for sharing us on the big screen. We love to see the show on TVs and in in offices. We love when we're just passively on in the background.

Uh we'll get the subtitles going at some point. Uh the live uh uh like like what you see in the chat. Live transcription. Yeah, live transcription. I'm sure we can do much better than what uh the standard is on TV. Uh that'd be great. Uh also another post from Rahul. Uh been on the show. Uh big fan of him.

Uh he says with intense focus. Really wearing the suit everywhere. It's fantastic. It's amazing, especially because he works in finance artificial intelligence. So, I'm glad he's dressing the part. Finance artificial. With intense focus, you can build a superior product.

He's sharing a text message that he got from a friend, I believe, or a customer. Just tried Julius for my first real use case. I've always been annoyed that the Eight Sleep app only gives you the last year of data. I found out that I could get a full export from them and ended up with like 10 megs of JSON.

I tried visualizing the data with Chat GPT and Claude and they just couldn't manage it. I decided to try with Julius and immediately nailed everything I asked for. Keep up the great work, man. I love that. And promotion for eight. Use code TBPN. Fantastic. And then download Julius.

He also was on a plane and he downloaded all my YouTube analytics and did Julius analytics on that and showed that uh microscope. I've been in the trenches content for years. Yeah. I think the first year I got like a couple hundred thousand views and then it was like 17 million 3 years later. It was great. Great.

Great run. Um, anyway, this post from Natsuki, I don't know if it's seems like it might have been photoshopped, but it was funny. Hey, chat GPT, look under there. Underwear. Uh, lol made you say underwear. Haha, well played. Say home. Home latitude. 52. 3974. This looks like an ad for Gamdom or whatever. Why?

It's a gambling ad. No, no, no. No. This is a way. So, so this is a hack that betting companies are doing where they're But why is this why is why is this AI poster posting out gambling? I feel like that's such a No, no. So, so basically what's going on here? I bet Natsuki just posts a lot of content. Sure. Sure.

The gambling companies I think are allowed watermark. Yeah, they pay for a watermark on a lot of posts. Weird. Weird. Anyway, um last shout out and then we'll wrap up the show. Uh, Mass, you know him from the, uh, the viral swing gate up in San Francisco we covered on the show. Uh, there was a swing on a tree. It broke.

He said, "You can just do things and built a new swing and then that was taken down, went back and forth. I called it the most important political issue of our modern day. " I said that unironically. Uh, but he just wanted to give a huge shout out to Daniel Straman, uh, Danielle Straman and 1517 fund.

They are the goats because uh he's been looking for compute and he got some. He said we were just approved for $100,000 in AWS credit. Thanks mass for working so hard on that. And huge thanks to Danielle for hooking us up.

So he got some he got some credits and he's going to be training a deca billion parameter time series digital twin of the global economy. The future is predicting the future. Are you ready? Anan. So congrats to Mass for working on stuff. I think he's working on the uh stock market prediction, hedge fund prediction.

very interesting area to apply artificial intelligence to obviously a lot of data and I wanted to give him a shout out on the show. Anyway, we've had a fantastic show. Thank you for watching. Thank you for listening and we will see you tomorrow. Have a great rest of your Tuesday. Enjoy your Tuesday. Goodbye.