Will Brown on AI scaling limits, RL as the real 2025 trend, and why enterprise AI adoption is slower than advertised

Apr 21, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

material. Yes. Be be perfect. Be perfect. And I mean, you could actually etch different things into it. There's so many cool things you could do with that technology. It's very, very awesome. Anyway, welcome to the show, Will. Uh, how's it going? Good to have you on here.

Uh, been a big fan of your of your posts for a long time. One of the top Many people have said one of the top post poster of the year, I think. Potential poster of the year for sure. He's been burning up the timeline. Uh, Jordy, where where should we start?

I mean, I have a bunch of I have a bunch of stuff that we can go through. I mean, we can just dive Do you want to give a brief intro on yourself and and who you are before we Maybe we should start there. Started talking about the timeline. Sure. Yeah, sounds good. I'm Will Brown. I'm a researcher at Morgan Stanley.

By the way, nothing I say here is Morgan Stanley opinions. This is all me kind of just like sharing my thoughts. Um, I am a researcher. I work a lot on stuff related to LMS there. Uh, my background was in reinforcement learning theory. I did my PhD at Columbia, so I've been in New York City for a while.

Big fan of New York City. uh great things happening here all the time. Um and yeah, I also just like like talking about the stuff on the internet and like participating in the open source community.

There's lots of like cool either projects or code bases or papers or models always coming out and like I a lot of my job is like needing to know all that stuff and being a liazison essentially like from the internet to the company where like people want to understand like okay what's the latest model for XYZ that I should be using?

the right open source toolkit, especially because like at a big regulated company, we need to understand the landscape, especially for things that are like downloadable off the shelf without needing to like onboard a vendor. Like some things you a vendor for, but it's also like a much heavier lift.

And so like we need to like map the landscape of like what's the right tool for the job for everything LM related. And so that's a large part of what I do. And it was also like my excuse for being on Twitter all day. That's great. It's a great excuse. Where do you want to start?

uh where uh aside from X like where where are you getting signal without giving away without giving away all the alpha because I imagine like you know by the time Darkh has like done a podcast on something like you know it's been that the information you know disseminated it's not necessarily alpha there's a surprising amount of alpha still on X just from places where you don't like like so there's places uh I have not found much alpha on LinkedIn um the group chats, the anons, um the uh open source uh like GitHub discussions, lots of really good stuff is buried in like a GitHub issue or like a feature request where someone's like, "Hey, this thing would be cool.

" Yeah. And these ideas are just all over the internet, but you got to like know where to look for them. Yep. Uh I want to talk about I want to talk about humor. You actually posted about this like a month ago or two months ago. Uh, you said it's interesting that 1.

5 billion parameters is all you need to crush math competitions, but you need like 15 trillion to make the model be funny. Maybe humor is the right measure of true intelligence.

And for a long time, my eval has been tell me a funny joke and every time there's a new model, it's tearing up X and I ask it to tell a funny joke and it's always the worst joke I've ever heard. It's kind of an anti- joke.

Um but but but h is that just a lack of of like uh of labeled data essentially uh or do you think there's something more uh more like innately human to the idea of humor?

I mean, I think humor is like really hard and it's also very hard to it is very hard to label, but I think it's also really hard to like, okay, this is like a a big debate topic is like would it like I think people kind of assume, oh, just like train it on funny data. Yeah.

But like the things that make things funny are really subtle and pretty buried. Um, I a friend of mine, co-orker actually did this experiment where he like he tried to like have one model be a judge of like is this funny or not? Yeah.

And what they do is they like lean into things that feel funny and that the results were actually kind of funny, but what it ended up just doing was swearing more. So like if you swear a lot, models think that's funny. Um, and so there's all these like kind of cheat like there's a lot of cheat codes. Sure.

and RL and training these models is like the path of least resistance is to find the cheat code. Um, but like real humor like the people that to be really funny, you have to be really smart. Like think of like your favorite comedian like uh Norm or Lar David or like all these people are like really smart.

You can tell from the way that they compose jokes like there's like an attention like attention is in like transformer attention of like ideas that need to combine in a very sparse like precise way to make a good joke is like you can't just like shove things together.

You kind of got to thread the needle to make a joke land. Um it's not a very coarse mechanism. And so I think like uh GPT 4. 5 is like ginormous model trillions of parameters most likely.

Um and that like there's more room in the model to have these like little sparse connections materialize as you go through layers of the transformer. Um and I just haven't seen anything like that come from a smaller model. How scale are you based on the results from 4. 5? Are you off to the races?

Scale is all you need or lesson pill or what? I'm very RL scaling pled like uh I'm not big transformer pill really like the training wall real um I think we don't like is if there's another 100 trillion tokens of data sitting out there ready to train on go for it. Um, I don't know.

Like one I think is like people should do some math and math about like how long until big GPUs are readily available and it's it's going to be a while before people can really run like even DeepSec R1 with like easy resources.

Like you can you can kind of do it on one node, but like the quality bump over things that are much smaller is just like like the we're hitting diminishing returns on capital investment is a lot of it.

Like they're taking out of the API because like the they can sell other things with the same GPUs and make more money is part of it. Like it's a lot of compute to keep a model up like that. It's slow. Yep. and the things that it's better on are like not that economically valuable.

And so the sweet spot appears to be in this like between like 30 billion active parameters and or parameters total and like couple hundred maybe. But like I don't know that like like when Meta releases this behemoth model, I don't think anyone's really going to run behemoth. Sure. For like their day-to-day stuff. Yeah.

It's just probably not going to be worth it. So I mean given that uh you know Tyler Cowan's calling 03 G AGI like the the the economic results of these like big but not crazy big models seems to be pretty good run it on a node.

Uh should we be talking more about hey it's good enough let's bake it into an ASIC let's just bring down the inference cost to basically zero like we did with Bitcoin hashing and whatnot like is that the conversation we should be having or I mean I think that's been that's kind of been like Groanova are like playing that game um and I I think like those businesses make a lot of sense like they can kind of say okay this version of a transformer we're going to be in this ballpark for a while we can bake that into our plans a little bit.

Um, I think what's the next way 2025 like people have been saying you're the agents, but like what I think that means is agentic RL like we're realizing RL works like the reason 03 is good is because it's trained to use tools. The way you train a model to use the right tool for the job is reinforcement learning.

Uh, and they've said as much like deep research reinforcement learning. Um, people have been throwing like random tools at models for two years and it only now works even though the models are the same. like GPT4 was a bigger model than a lot of the models people are using now.

It had just as much training data, but it was not or at least doesn't seem to have been RL in this specific way. And so that's kind of like my bet is like people are going to really want to train models to be agents. And I think you can get that to work well with a pretty small model.

Does that mean like a flourishing of RL uh big transformers for different tasks or are we still searching for like the god model that can do everything all at once? I mean, so okay, I think there's a a couple leaps we need to have models that can do everything all at once for like a super long amount of time.

Um like my I tweeted something about this like my what I'm happy to call 03 10-minute AGI. Mhm. And I think like framing AGI in terms of like length of time it takes a human to do a task is like more reasonable than like a global framing.

Like sure there's a bar of like drop and replace for a human that we are like definitely not at yet like for general jobs but most things a human can do in 10 minutes you can like get 03 to do that pretty well. Um and so that's a very even like a like make a website in 10 minutes. Okay.

You have a good human designer is like probably can whip something pretty quickly. Yeah. But but not like an entire design system that works together across all the different Yeah. makes a ton of sense. Um so uh so so uh seems like we're transformer maxing, we're RL maxing.

Uh is there a new paradigm that you're excited about? Uh program synthesis? Are we bringing back symbol manipulation at some point? Like what what are we going to pull from the tool chest to make this thing go to the next level? I mean I think it's just tool calls.

Like like I think when people say program synthesis like we're already there like 03 is program synthesis but the programs are like JSON and Python. Y like you can do a lot with that.

I would have I did some tests on like RKGI problems where you give the screenshot to 03 and it can like basically figure it out in 10 minutes of like zooming in and looking at the thing and then writing some code to see if it like reproduces the thing and like it doesn't nail it but I would they haven't released the benchmark but I would guess it'll do reasonably well.

Yep. Um what as much what are you seeing around AI adoption in finance broadly?

I feel like we've been promised like, you know, somebody can just press a button and generate the deck and like generate, you know, a 20page investment memo and and uh are are are these types of uh you know, deep research style tools being used extremely heavily already or is there just even a greater now that they're being used, you know, hey, let's go uh spend our time talking to, you know, three times as many experts so that we actually get proprietary kind of um insight into the into the business.

Great question and I think there's a couple different answers I have.

one is that like on one hand we at least have been pretty I think fast at certain things at adoption like the day GPT4 launched Morgan Stanley had integrations because we had been working on it and these were like we had press releases for these like we are ready to go and so there are certain things where like initiatives can happen where a effort comes together to make a thing ready to use but the like there's a long tale of smaller tasks that you do that you there's not a drop in replacement for if there's a vendor it would take a long time to onboard them and it's not going to really solve the problem right away and you could build something from scratch if you put a few people on for a few months but like engineer your hands are like few and far between um and so I think like deep research is an example of a product that I would say like works really well um for the thing it's built for it's doesn't quite like do other stuff super well like if you wanted to give you a table of like 50 very precise things it's going to make some mistakes there.

Um because the report format has a lot more room for like slop um without it seeming like seeming like noticeably bad. Uh the PowerPoint Microsoft PowerPoint copilot is not great. It's not a thing that I have heard many people say saves them time. Um uh what are you Yeah.

Do you have do you have any takes on enterprise AI adoption broadly? We were talking, I think late last week, about how Johnson and Johnson had tried hundreds of different tools.

Like they went through the effort to just like try a bunch of stuff because there was a top down mandate and now they're just like cutting probably 90% of it. And so all the startups that were like, "Yeah, we have a pilot with J&J. It's going great. " Like they won't churn. Well, 90% of them just churned maybe.

I mean like most of these pilots are very much intended to be churned. like they're not they're very much in a they're not being rolled out broadly. They are coming through in a kind of walled off environment for people who are like going to be the beta testers.

And so like we have a crew of people who like and I'm involved with this as well like there as new stuff comes like online we test it out we give it feedback.

Um, most of these do not convert um because like we're very willing to like try stuff um in terms of taking the call in terms of like looking at a demo but actually making a thing be part of like the companywide workflow is like a pretty heavy lift. Um think of like onboarding a cloud provider.

Like a lot of the reason like maybe this is like one reason cloud providers have the margins that they do is because like moving clouds is really hard. um moving all your stuff from like AWS to Azure is a pain.

Um I think especially in regulated industries like a lot of software onboarding faces the same sort of hurdles um where you can't just like bring it in and use it. You got to like go through a whole process and some companies are have kind of planned for that and some have not.

So like one example is I think like a reason that Windsurf has been successful as a cursor competitor is they lean they've leaned way harder on enterprise than cursor has like they have really designed for enterprise integration whereas cursor really has not.

Um do you uh was the windsurf news surprising to you at all or was that just obvious? OpenAI cares about coding and they should have a sort of dedicated enterprise, you know, coding. I mean, it makes sense. Yeah.

I think like to me it felt like the a sign of there being some more friction with Microsoft because like originally it was like, oh, they already have that as co-pilot. Sure. Um, and so this to me seems like them move taking a step away from Microsoft. Um, but I mean this is just speculation. Yeah.

Um, it does make sense that it's cursor and that is Wind Surf and not Well, I mean they did try to buy wind cursor supposedly. Um, like I'm I'm a cursor user. I think it's great. I haven't seen anything that really sold me on like go move to Windsurf. I'm sure it's fine.

Um, it's just like the bar to like switch for me is like I need a thing to have a a feature that's like killer that the other one doesn't have and I have not like seen that yet, but I'm sure they have plenty of good stuff. What uh now that we're like a couple months out from the Deep Seek moment?

Uh, what is your takeaway? Is it something around kind of the the optimization of of of the inferencing these models or uh h how have you processed uh that news now that we're a few months out?

Yeah, I mean they're like incredible engineering like I think a lot of their open source releases have really like like since R1 they've released a lot of code and details on their inference stack and training stack.

Um, even even Sam Alman was posting like, hey, if you work for a high frequency trading firm, like come work at OpenAI. And I read that as like, oh, they want to optimize their models now, right? I think the some of the big players have realized they didn't have to that much.

Like, of course, uh, Deep Seek is serving all of China on 2,000 GPUs. It's kind of silly that Anthropic can't like has to have the warning about, oh, we have high limits. Please try again later for like their paid users. like that's whereas like the deepseat chat is literally free anywhere.

Um and so I think some other people probably are working on upping their uh inference efficiency game but it's also like hard um because it's very model specific. Every model has its own quirks and you got to optimize around that. Um it's hard to like have things be reliable and fall tolerant.

Um, so also just so you think you can't just port back like FP8, wasn't that one of the uh things or like the the the mixture of experts blocking all those different things?

Like it seemed like there was some stuff that where it was like week two we were getting, you know, analyses and I was like this will probably be open sourced and ported to Llama and all the others like pretty quickly. But has it played out differently? I mean to me the surprising thing was not any individual one thing.

It's that each of these is maybe like a 30% 50% gain, but they have like 10 15 of them that all stacked. And so getting all of these to stack nicely is what's hard. That That's interesting. That's a great take. Yeah, that's interesting. What are you expecting out of uh Alibaba and and Quinn three?

Oh, I I mean I'm really excited. Like they make I think still the best model suites for like doing research. So like I do almost all of my experiments on Glen models just because they have a bunch of them. There's like so many versions like for every different model size there's like seven different versions.

There's like a code one, a math one, a multimodal one, an audio one, a base instruct RL like they really are optimizing for user friendly like open source model ecosystem um in a way that uh Llama was doing last year. This current year we'll see if they can get their act together.

Um, so are they are are they also like RLP pill at this point and and moving? I'm sure they are. Yeah, like they have they have a reasoning model. The reasoning model is not as impressive like but it's it is a pretty small one. It's like a 32 billion parameter model that like does well in math competitions.

Um, I don't know that anyone has really bitten the RL bullet in the way that Deepseek did early and then OpenAI has been doing lately. Sure. where they're really kind of betting like OpenAI seems to be essentially betting on scaling up RL as the path.

Um, and that it's not just longer reasoning, but it's reasoning integrated with system interaction. And I sure like that's kind of the drum I've been trying to beat for a while. Like a lot of the open source work I've been doing is around uh multi-turn tool calling RL.

Um, I think that we need better ecosystems for that. like very like there's starting to be more tools you can go use, but for a while there was just nothing that supported this. Everyone was very RLHF pill. Yeah. For too long. What what do you think is going on with Llama 4?

Did they just miss the memo about RL or uh is there something else at work? I mean, doing this stuff at a huge company where a lot is on the line is hard and it's way easier to do nothing than something. Sure. Um, and Meta also like doesn't have to print money off of their models.

Like they don't uh like I think one analogy that I've been giving to people is like why did Amazon not win NLP? They had like 10,000 people in the Alexa team. Sure. In like 2018. Yeah. And they have just now released like okay language models. Um and they're really betting on Anthropic to drive their revenue there.

Um how are you thinking about uh the trade war in the context of the data center supply chain and is that even an do you pay much attention to that or is it kind of you're busy enough on on model? I do follow it. Yeah. I mean um I tend I listen to Dylan Patel talk a lot of stuff. Um he's great. um semi analysis.

It's hard for me to like give a real take. I don't think I have I think I think it's like important to keep an eye on. There's a lot of pieces of the supply chain um that will get I mean it depends on what happens like Yeah, it's going to get messy for sure probably. Um but I I don't have like a a hard stance.

You should become a VC and then you can just give it anyway. Exactly. Yeah, you should you should flip over. No, I just think it's funny like uh in a year we'll be like, "Bro, we were worried about the wrong Transformers. " Yes. Yes. Yes. For sure. With the energy thing. For sure. Um I I have one last question.

Um uh h how's the culture at Morgan Stanley? I mean, it seems like talking to you, you sound like a Silicon Valley founder, uh but you're at like a company that's like over 100 years old. Uh is are are you part of a new guard or has Morgan Stanley always been this way and you're just the first to kind of post about it?

Uh what's it like working there?

I'm definitely like the first to like tweet a lot about it, but like the team I'm on has been around for maybe like it's like a machine learning research team where we try to we try to be like a version a finance version of like an MSR or a Bell Labs where we like sure off keeping up with the research.

We write papers, we publish, we go to conferences, we tend to we kind of hang around as like expert consultants and advise on a lot of efforts throughout the company. Um, and so the company has definitely been like betting on machine learning for a while.

Um, I think we are like probably ahead a lot of a lot of like the quant firms in terms of the how early we were on realizing deep learning was important. Um, and I think we've done a pretty good job at like keeping up with and following the like L and craze like we were we partnered with OpenAI before CHBT. Um, wow.

Wild and Yeah, that's amazing. This was great. Let's make it a regular thing. Yeah. Yeah, this is fantastic. Yeah, I I definitely want to have you back when there's a new model release or something or new paper publish. Uh I'm excited to follow.

If I can do a quick plug, um in June, uh I will be at the A Engineer uh World's Fair in SF giving a talk as a follow-up to my previous one. Very cool. I think we're going to see you there. Oh, sweet. Awesome. See you there. Yeah. Um and then also I am doing a course of some kind soon.

Official announcement pending, but you really want to like how to get rich quickly. Uh agents and RL stuff. Okay. Um and so if you DM me your email, I'll put you on the list for more info. That sounds awesome. Awesome. Good luck with it. Uh I'm excited. Cool. Yeah, this has been great. Yeah, thanks so much. Talk to you.

Thanks for coming on. See you. We'll talk to you later. Bye. Uh should we go through some timeline and then get out of here? Yeah, you're a timeline addict. Um there's I mean there's so many posts and we don't have enough time.

There's so much in the Monday is always a stacked day because you have a whole weekend of posts to catch up on. Uh Adam Morin says, "You're coding at the bar. I'm drunk at the office. " Respect to that. I love that.

Um Aiden says, "The most pe the people I'm intellectually respect the most have quite lopsided output input ratio. They write, build, create more than they read, study, or absorb. Geniuses are not sponges. They're volcanoes. " I like that framing. That's interesting. Uh, I'm I'm kind of going backwards.

I like this one from Telmudic. The first time I heard of the GoMad, gallon of milk a day diet, I laughed my I laughed. There's no way I'm having my milked intake for a diet. Goad, what are you cutting? The idea that he's doing two gallons of milk a day. Hilarious to me. Timeline and turmoil.

Timeline and turmoil to uh this weekend. Uh Paul Graham taking shots of Palunteer saying you shouldn't work at Palunteer. Uh Gary Tan chimes in and says, "Is now an awkward time to mention I helped come up with the very first Save the Save the Shire t-shirt at Palunteer? " Uh very very funny.

Uh you know, we love we love Palunteer. We love PG's writing and he is a foundational member of uh the the tech elite. Um but he gets he gets spicy with the political takes. He has strong political opinions and he brings them to the timeline and uh well I like that uh Barry and PG can have a little bit of fun.

Yeah, exactly. Love to see it. Uh WebDev Mason always with some great takes. There's uh there's some people uh the Blue Origin story has really grown since we first covered it. Now people are very upset that Katy Perry went said it was an affront to real astronauts. They didn't go all the way to space.

They only went to the Carmen line. I saw people saying uh that, you know, how could you go to space when there's problems on Earth? Yes. Then you could also kind of extend that out to how could you do anything problems on Earth? Exly.

I think that's so funny because like obviously it's like why are you wasting money on space when there's people that are hungry? But we literally talked to a farmer who was like Starlink is increasing farming yields.

It's like no actually like going to space and spending on like the crazy thing actually help people eat which is great. Uh, but webdev Mason says, "Uh, this is so sad. Dead culture stuff.

Eject me into the timeline where she yanks the microphones in and shouts," I flew over our home world and saw us, the pale blue dot, the blessed sprouting seed of the Virgo supercluster, and I must report to every living soul that it is dope.

Uh, so she wants she because the story is that Katy Perry now regrets going and says that there's been too much backlash. She says that she shouldn't have gone, but Mason says, "No, own it. go and celebrate it because it is fantastic. Uh question about watches.

Uh we already talked about bezel, but is it appropriate to wear a Rolex GMT as an analyst? I love this answer. It says it's it depends. Imagine your first allnighter.

There you are burning the midnight oil when you decide to take off your Rolex GMT and place it on your desk to allow your wrist better control over the almighty Excel shortcuts you are about to employ. You take a look around and what do you see?

Every other analyst has placed their PC Philippe Calatraa travel time in front of them for the exact same reason. Now if you sir are capable of bearing the overwhelming feeling of shame that will inevitably conquer you then by all means consider it absolutely appropriate. Hilarious. What a great post. That's hilarious.

Anyway, um we should cover the VCs, uh the what's going on in higher ed at some point, but there was a good post by Connor. He says, "Vs are attacking higher education. Trump squeezes higher ed funding. Universities sell PE holdings to fund their operations. VC funding dries up.

" I don't think this is actually going to happen. Uh what the the news is that Yale is like selling off a portion of their venture capital portfolio, but I doubt that that really has affected funding landscape. And I don't think I don't think the university endowments are a major major source.

I think like pension funds are even bigger now and and uh and like uh sovereign wealth funds are even bigger. Yeah. And also very likely again uh I doubt they're reacting to shortterm pressure.

It's probably, you know, uh, Yale selling $6 billion of secondaries probably is part of a larger strategy to generate liquidity on long duration investments, right? So, here are some posts that I want to follow up on.

We'll highlight them today, but we'll either have these folks on the show or do deep dives on these topics. Uh, person of swag, Adam, uh, says, "Vibe sheeting, is this anything? " Uh, he's built cursor for Microsoft Excel. uh going into the lion's den competing directly with Microsoft Copilot.

But we just heard it that some of the Microsoft Copilot products are falling behind a little bit and so I thought it was fun that he was uh building a like a plugin just into uh Excel and I could imagine this becoming a great company. So excited to talk to him about that and dive in deeper.

Uh also uh there's a new planet Delian mentioned this K218B. I did a whole uh deep research report on K218B. Uh very very fascinating. Uh interesting. I had science. Uh I had Chachi tell me an entire speculative science fiction story about how we might get to K218b.

Spoiler alert, it's going to be like 500 years to get there even at like. 1 C or something like that, which is the speed of light because it's so far away. Uh but a multigenerational math to understand how many episodes we could do in 500. Quite a lot. Interestingly, I had chat GBT.

I was like, "Uh, tell me a sci-fi story. " And like, "Why don't you just make me the character in it? " And it was very weird. Uh, and it was like, "Now you're in cryosleep for a hundred years.

" Uh, it was like, "Now you're an old man, but because of uh life extension technologies, you're 150 years old and like uh like all like your kids are now older than you because of this weird time thing. " It was very fun, very weird. Not appropriate for the show. Anyway, get on. Uh, last one is uh uh Notable Cap.

This is a leak from Arthur Rock. Arthur Rock uh series B uh to uh browser base, right? Paul Klein who's been on the show. So uh I assume one last one last one uh from Near. I like this one. Very end. Reminder of how far AGI goalposts have moved. It's from an old book or something.

It says an AGI could beat you a chess, tell you a story, bake you a cake, describe a sheep, and name three things larger than a lobster. It's also solidly the stuff of science fiction and most experts agree that AGI is many decades away from becoming reality if it will become reality at all.

So wow the last three models I use could describe a sheep. Couldn't bake the cake though. We need the humanoids for that. So but it can tell you how to bake a cake and it really can do all of those things and more. So yeah, AGI is here. It's just get on with your life. Unevenly distributed. Yeah, that makes sense.

I was at the beach over the weekend and I was thinking to myself, I was looking around. I was like, none of these people are AGI pills. And then everybody went back to enjoying the beach. Yeah, that's the nature of it. Anyways, thank you for watching today. Uh we will see you tomorrow. We got a great show for you.

Bunch of news breaking and we'll talk to you then. Bye. Looking forward to it. Cheers.

← Back to story