Shawn Wang on MCP, AI agent horizons, and why building infrastructure companies is a warning sign

Apr 4, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

your I I I know you as the host of the latent space pod, but uh in your bio, you have a number of different affiliations. Uh what's uh uh how should how should the viewers think about you? Uh someone with basically ADD and too many projects. Um but yeah, I'm Sean or Swix.

Uh I started u the latent space podcast to cover the AI engineer field which I helped to popularize and I also run the AI engineer conference which is uh happening in a couple months.

Um primarily I think the the one in New York uh that I did um helped to kickstart a lot of the recent MCP hype that you might might have been seeing. Oh yeah, I'd love to get into that. We can get into that.

Um and I think yeah what you were referencing just as you came in was um you know the news of the day right like AI27 um I think very important to at least try to go through the thought process of like what might happen but this is essentially fanfiction right like we don't we don't know exactly what will happen um and uh you know we've we've been getting various forms of AGI is two years away for quite a number of years now uh and I think part of why I started the AI engineering um sort of trend trend is to try to get people towards building more things instead of just speculating uh on what the big powers that that maybe will do because uh you have a lot more agency in your life to do things if you just uh you know focus on what what we have today.

Uh granted that AGI is very big and like I do take that seriously. Yeah, of course. Uh well, let's go through some of the things that you do think are worth building. I want to get into agents and we've been asking a lot of people, you know, when can this thing book a flight for me, but maybe we should start with MCP.

Can you just give us an overview? I saw all the viral posts. Um, a lot of debate over is this just an API or is this something more important? All the leading labs are putting out papers and implementations of it. Can you give us your kind of uh MCP explain like I'm five for dummies?

And then maybe take me through a few of like the implications for the market and the different AI labs. Yeah, I'm pretty sure you already had like a couple explanations, so I don't know if I'll be, you know, adding a lot here. I'll just give you my my version and then we can riff on that.

Um, so MCP is a protocol for uh a lot of different integrations into agents and I think that um there have been many attempts uh a lot of different configurations of I will include the framework. I I'll in I'll start from the u gateway, I'll start from the integration side, whatever.

Uh but this is the first one that was very seriously put out by Big Lab. Uh we actually uh were the first podcast that the MCP creators uh did an interview on and that's we just released that yesterday. Um and uh it's it's it's shocking how impactful this is for the effectively the side project of two guys.

And this is uh and they work at Anthropic. Is that correct? They work at Anthropic. Uh yeah, over in the London office. Um yeah, you can you can check out lat. spacem. You'll see it. Cool.

Um and uh and yeah so so I think effectively maybe for the normies um whenever you see these like at symbols like when you're in a chat and you want to add something and you want to include a tool and you want to include any kind of um uh agents or any kind of uh you know subsystem like a notion or zap year or what what have you for for for developers we use things like Sentry uh Microsoft Copilot and GitHub just put one out today um and I think Google announced something I don't know what they announced But they announced something.

But basically the entire industry always the case with Google. They put out something but I can't find it. I mean there's just a lot, right? Like today they just announced 2. 5 Pro pricing. Okay. So my viral tweet of the day I always have like one a day.

Uh is that they now completely own the entire Prito Frontier of all all labs. Like they had the smartest and and the cheapest and the most cost effective Purto Frontier between. Very cool. Which is which is which is incredible. Anyway, going back to MCP.

Um so each like uh this is what we call in the industry like a M* N problem. Meaning I have this one app um I write integrations for this one app. So but then when I move to a different app I have to rewrite the integrations again. Uh and obviously that's annoying.

Obviously we want to share integrations obviously like the open source community can improve better things better together instead of uh individually and we should stop competing on these things and start competing on other better things.

uh and it just took a player like Enthropic to put this forward and this is the one that got enough women.

So um practically for what the normies uh would sorry I don't know if it's derogatory to say normies but you know not people who are not in the mix every single day um practically what it's going to be is that you're going to be much easily a able to add um integrations to basically anything in any agent.

Um and that will you know at least help you um get use out of them faster just because uh individual companies are not writing their own integrations anymore. Basically everyone's onboarded to this uh ecosystem.

Yeah, I think normies is uh it can be a pjorative use as a blanket term but I think everyone you might be a normie in defense technology uh a defensive a defense founder might be a normie in AI technology and that's and that's fine and that's why we have a variety of guests on the show.

Uh with regard to MCP, the the the most basic question is like how is this different from an API? Zapier already exists as a binder between APIs and also like we're hearing coding agents getting better and better. The Devons, the cursors, like why is it so hard? It feels like AI is incredible at writing code.

Why can't I just use use AI to say, "Hey, I want to interface with this API. Just go figure it out every time you compile the code. If the API changes, figure it out and and reverse engineer the API. There's docs out there. Here's kind of the general idea of what I want. Go figure it out every single time at runtime.

Yeah. I mean, the second question is easier answer than the first. Uh ma mainly because it's probably easy to figure it out. Uh figure out common implementations, but then you won't tackle the edge cases or the bugs that happen.

Uh and obviously it's also very inefficient to keep coming up with it uh a new implementation and integration every single time. Sure.

Uh so when possible actually it's better to not use AI if you have the option to not use AI like AI is meant to be a plug for things that don't exist but if you do have the integration that's written and battle tested by like you know thousands of people before you why would you not choose that like only if your needs are not being met by by that integration then go ahead and and write your own.

Um and so so uh and the cost of writing your own has come down a lot, right?

Um which is which is also very interesting from the sort of open source ecosystem point of view where uh people are now vending a lot of their libraries that they would typically import uh without thinking just because they're like, "Oh yeah, someone's written this for me and it costs too much.

" Anyway, so uh then let's come back to the first question on how does this compare on API. Um and that's the most that's why developers hate MCP. They're like this doesn't do anything over open API.

Uh and we we asked them uh we asked the the authors of that question uh point blank on on the show uh and basically it raifies concepts that um would not be would be undifferiated as far as the normal API spec goes. Um so for example there's concepts like resources, promps, tools, sampling routes and transport in MCP.

Um and like those are there those are basically special subasses that have have uh they are treated differently in the MCP environment whether or not they are controlled by the model versus the application versus the user.

So they are very very distinctly um parcled out the permissions and the the intended roles of each feature.

So I would say it's kind of like a layer o over APIs that have become more AI native and that's uh made it anthropic is at least saying that this is much easier for us to use and you're going to build much more effective agents if you do this. Got it. Um, sorry. Please.

Do you feel like MCP will, you know, massively accelerate the creation of of agents that uh become a, you know, very active part of our lives on the internet. It was, you know, we were at um Y Combinator demo day and there was a lot of uh AI agent infrastructure companies.

Um there was probably more AI agent infrastructure companies there just in one demo day than AI agents that I've sort of tried to use. Right?

So it's like everybody wants to do picks and shovels but we actually just need you know we need more sort of shots on goal at you know potentially the harder problem which is just reliable agents. Yeah, I fully agree with that actually.

So uh it's actually very depressing like being an AI agent infrastructure company is what you do as a developer if you have no other ideas. Um so that I mean YC should take that as a warning sign that uh their their people are not going after the right path.

Um it it MCP makes integrations easier period right what you do with those integrations who what problems they solve that is still an open question but yes for sure the ecosystem is about to get a lot stronger because we're now no longer rewriting things.

We're converting m * n combinatorial explosion problems into m plusn uh where you're just write uh every integration once hopefully I mean maybe twice but um uh yeah I I I actually I like the opportunity to address this because I think people think of me because I wrote the article on why MCP won and everyone's like you're you're just a show for MCP.

I like the opportunity to just say like no like like this is very good. It's a but it's a protocol like did you get excited when REST was invented? No. Like you got excited by all the applications that REST enabled. And that's those are going to come down the line.

Um, and you're going to but I think like the normal consumer like would just should just be really happy that like these high quality integrations are now going to come a lot more out of the box than you waiting like five months for your favorite app to to write the integration for your your favorite thing, you know.

So, um, your agents are going to be able to do a lot more things.

But I think um the top agents like the Sierras of the world uh they still want to own the endto-end experience and they will use MCP uh but they don't necess like they're life or they're not really like depending on it you know like it's not like life or death for them like you it's still on you to build uh a an agent or a product or solution whatever that solves a problem your customers need and that doesn't go away.

Yeah. What what's your thesis on, you know, the do you do do you see a Cambrian explosion of consumer agents, B2B agents sort of built on, you know, uh, in part on MCP and and you know, comparing this to something like fintech, right? We had the CEO of of Plaid on yesterday, right?

Like everybody in hindsight should have worked on Plaid. Super valuable. It's a $6 billion company right now. It powers a lot of stuff uh within finance. Um, same with Stripe. Stripe, but you can't name that many other sort of like generational companies in fintech infrastructure.

Sure, there's there's a bunch, but like no household names. And I can see the same thing playing out in the agent space where maybe there's a like almost like a stripe uh equivalent in the agent space or or plaid.

But then you know the best thing you could have done you know if you didn't build plaid was to build on top of plaid and and build sort of these novel product experiences. Um there are at least five or six startups that are trying. Uh I actually live with one of them that's called smithy.

Uh that's like decent but all these are super early. Um the main challenge that they're going all going to face is that Anthropic is coming up with their own registry. So Anthropic wants to own own this. So I don't know if there's a separate plan that of of MCP that comes out doing this.

Um, so, so that would be my my my two cents there. But wouldn't wouldn't one argument for smiththery and I'm not familiar with the company be that uh, you know, other companies don't you know, if if they could ever see themselves competing with Anthropic in any way.

You know, they wouldn't be want to be reliant on Anthropic's registry. So maybe a new third party that is, you know, kind of a pure play infrastructure player should exist.

totally possible but uh it's on the the burden of proof is on them not uh like by default whatever the is is the big lab official solution always wins um which is pretty brutal in AI terms but like the world is not meant to be fair right um I think uh the one thing I should mention um that I I neglected to that I do think is very bullish and like will result in a lot more capabilities is that MCP servers can also be clients and that's a kind of like a technical thing uh but that is effectively what is going to enable servers to then become agents and orchestrate agents uh fleets of other MCP agents uh on your behalf uh without you knowing about it.

And so um that actually turns these things into much more agentic networks. And we're probably going to see that like I would say like this is too early right now because MCP is like four months old.

Um probably like the end of the year, early next year I I would say like this is starting to this will start to come up because they built that in from the start. Uh, is there a world where MCP just gets steamrololled by the next generation of models?

We kind of saw that with some of the workarounds to uh, context windows and then Gemini comes out with a million token context window and a lot of people are saying, "Oh, well, like all those all those workarounds are kind of irrelevant at this point.

" Um, how do you think about the durability of MCP over the over the near term? MCP actually improves with more context window utilization. So, uh, they're not at odds.

uh there's a very very long-standing debate on context versus rag but this is slightly different mcp just like is the ecosystem integrations that model models really don't have yeah that said I would say that if there were any challenger to MCP it it would come from Google because Google has native integrations to Gmail calendar YouTube what have you already launched it and it's already first party in there and now MCP come along and threw a threw some uh noise into their nest well I mean Google kind of lost the front end wars with Angular versus React.

So, you know, there there is precedent for someone for, you know, another tech company coming up and and resetting the the standard, right? Uh, sure. Uh, and uh, yeah, they they I I I I would say that I don't think they should spend time here anyway. So, yeah. Yeah. Yeah. Yeah. I agree.

Maybe we need a poly market on it or something. Let's switch gears and talk about AI 2027. I'd love to uh I'd love to I'd love to just let you rant maybe for the next 10 minutes straight. Um but you know initial um I think so I think these guys have a really good track record.

Um I don't know actually if you've interviewed them already. No not yet. I'm sure I'm sure they'll come on. You know they seem to be doing a podcast tour. Um and uh and I think there's there's some public service in doing the math and drawing lines. Right. That's what that's what they did.

Uh that's what situational awareness was. Um and like I think because we are very we live daytoday we don't really see the yeartoyear uh as clearly just because we don't spend time on it.

And if you just draw lines and you like you believe that what has happened in the past the near the near past is probably going to happen in the near future. Um you can probably see at least some kind of trend line. Uh the main caution with all these things is that s curves do exist.

You know, if I'll just, you know, rewind your mind back to like let's say April or May 2020 when every chart on COVID was going like this and there would be more COVID cases than humans in um in in in like the by the end of like 2021 or whatever and that never came never happens because one we reacted to it or things we hit invisible as Eugene way calls it where like you didn't account for this because we weren't there yet like there was there sort of this limiting factor that you that you didn't run into u but this is relatively near-term.

It's basically in the next 18 months. Um and uh I I think like the main geopolitical geopolitical aspects is kind of interesting. Uh we have no idea what China will do really. Like um the the main meme on Twitter uh which I really like from tour Texas is that CTM thing just does nothing and then the USA just implodes.

Just the funniest outcomes the most likely, right? Maybe that's it. Um who knows? Who knows? Um but I do think that um what you can there are things where like it's really just down to the individual decisions of powerful political figures that we really cannot tell.

Um and then there are things where like if they're basically extrapolations of scaling laws that you can't tell because they're just you know uh just like scaling laws that have already been established and you you're betting on the end of them which is less likely to happen.

Um so I think the the coding um agent uh commentary is really good. Uh, and I think that it starts it starts to talk, you know, talk about hacking and robotics. I think it's all it's all on uh very very unbalanced.

Um, I think where it starts to get into a little bit more gray area is like the political and and bioweapon stuff. Yeah. I I I love the the kind of reference to COVID. Everyone thought it was an exponential. It was actually a sigmoid function. Uh, everything's a sigmoid. It's a series of sigmoids pretty much, right?

And uh we and it certainly feels like we're experiencing a sigmoid curve with uh pre-training and we're seeing pre-training diminishing returns. Uh it feels like there's still some juice in RL. Uh are are there any uh next steps that you're excited about?

We we we've heard that uh program synthesis is kind of a hot area that people are investigating right now. Um what are you excited about? What do you think is potentially overrated or underrated?

I actually think people there's a lot of alpha in splitting out what you call pro program synthesis what I call code generation. Yeah.

We've we've effectively gone from like single line autocomplete to like functions autocomplete and now with like wind surf and cloud code and um co-pilot and cursor we are generating basically entire apps um and and PRs uh for those apps.

And I think being really clear-minded about what where those capabilities are and improving them incrementally, you get things like cursor, which you know zero to 200 million are in two years is crazy. Remarkable. Yeah. So, um I I think I think a lot of alpha there.

Uh but I think like uh really what what people are looking at now is the uh the general measure of agent trajectories. How long can an agent uh on a on a broad number of tasks um autonomously operate? Right? So meter um I forget the the acronym it's it's it's one of these like research institutes.

Meter put out a study of uh the agentic uh work that can be done um you know by by a wide uh range of benchmarks and dated it ran it ran it all the way back to 2019 and basically came up with the idea that um the agents horizon for the 50% capability like 50 percentile capability of human capability is um is grow is doubling at between three to every 3 to seven months.

Mhm. Right. And we're now at an hour. So roughly you can leave uh the smartest model that we have which they they measured as 3 cloud 3. 7 sonnet um to run autonomously for an hour and do like the what what a 50th percentile human can do.

Um you can obviously bump things up or down in terms of percentile but you know the the the results and the scaling laws don't vary because the important thing is that they're doubling every 3 to seven months.

Um and that means you can roughly scale out when when are we going to have one day autonomy when are we going to have uh a week a year a you know a month a year and u that I think you can kind of set your clocks by and try to think about what products or companies you would build.

It would be so funny if it was an S-curve and it maxes out at exactly eight hours like a human like you can only give it one task every single day and it just goes and does that. It's great but uh yeah just I don't want to work overnight. Look, I'm I'm maxed out and it's at that for like two decades or something.

Unlikely, but funny. That'd be an interesting universality, right? Like so, for example, a lot of people building agents have independently discovered sleep. The agent the agents have to sleep because they have to compress memories and and do like the deep REM thing where they actually turn those into long-term memory.

Like we have like the computer agents have to do that and humans have to do that, right? It's gnarly. I I'm not kidding. Like go look up for those. You need a sleep for agents. Maybe it's eight sleep. Brian Johnson holding a candle going like I mean I was Yeah. I mean you're joking about eight sleep for agents.

I was joking about like what like is there will there be like an alcohol for agents or alcohol for LLMs where you just kind of like throw some randomness in the weights and it gets a little bit crazier but sometimes it's brilliant you know and it's like a lot more bold. Yeah. Exactly. Uh, but maybe it's not that.

Maybe it's just like to get true true inspiration, you need to just throw some extra chaos, some I guess the term would be like higher temperature in the LLM responses and then run 10,000 of them and you get some more inspiration instead of all coalesing around the same kind of you know 50 50th percentile human like the yeah that's called sort of mode collapse where you collapse modal distriution but like yeah for those interested we've moved on basically from temperature to var entropy and also Uh also I think most people would be interested in like the anthropics um uh alignment work on tracing the thoughts of LLMs which which came out recently as well.

Um there's some really interesting torture you can put on these LLMs where you ask it to say like banana but then you don't allow it to say banana and it just you can see it struggle to like oh interesting find alternatives of saying banana it never so funny. Wow.

Um but for example one of the R1 replications R1 replications you can also do do things like if you just want it to think more you can just prevent it from ending its thought and just insert weight or like uh you know but I thought something and you just like you can force its direction to think another way and uh for it to uh branch branch out again in terms of its diversity.

So um I think there's a lot of research here that is super interesting. I'm not sure any of it is going to bear fruit because uh the the big labs are probably like two years ahead of us in the open research world there. Sure. Sure. How do you how do you react?

Obviously that the dominant story this week in the wider world is the you know tariff uh tariffs and trade warns all that stuff. Do you do you feel like it's like arguing over, you know, pennies as a AI sort of steamroller just like uh is about to roll over the entire economy? Do you even give it any attention?

So, you know, I I I heard you guys ask the OpenAI guys about this. Um, and like basically it's a rounding error, right? But obviously the supply chain really matters for AI. I think the US has benefited a ton from the global trade setup that we set up for ourselves effectively after World War II.

So, uh blowing that up may not be the best situation um if we don't have a more constructive or well-reasoned u end state that we want to be in. And I'm not sure we do. Yeah. The White House hasn't given us a lot of uh uh uh comfort about where we're going. Yeah.

I I I my take on it is like I think AI is probably going to be the the main driving force of like serious cultural and economic change over the next few years and uh no matter what happens if it's good both political parties are going to take credit and if it's bad both political parties are going to blame the other political party for whatever happens and same as ever and really do you worry about do you worry about the immediate impact on the sort of data center compute supply chain broadly I know Elon was posting yesterday about how a lot of the conversation will maybe switch from GPU shortage to transformer shortages.

Is that top of mind for you at all or are you more focused on the transformer like power transformer? Yeah, power transformers not the architecture.

So uh for for for reference uh the vast majority of transformers are the physical electrical transformers are made uh abroad and they're very very important as we start pulling gigawatts uh moving it around the grid to get into these large data centers. Yeah, I hear you.

Um well short term is like you know not my department so that's that's an easy out easy copout. Uh yeah, I I think um the uh I I I think like whatever happens over the next two years, like we really don't know. Like a lot of these negotiation like these are the start of a negotiation kind of.

It's just the way that Trump does it anyway. So I like you know I'm not I don't want to get too much into that. All I will say is like a lot of what uh I try to guide people to towards on AI engineering is utilization of existing capability.

Like so much model capability has been unlocked and it's not evenly distributed in our lives. Like why do I have Siri that doesn't understand what I want? It's because Apple hasn't shipped hasn't got a [ __ ] together. Like not frontier tech.

Like what Elon is dealing with, what uh what Sam Alman's dealing with, what Dario's dealing with is frontier tech. And that requires uh giant data centers with all all this kind of research.

uh but really what AI engineers deal with is deployment of existing tech um into our into business and personal lives and I think uh there's a lot more to do do there and I think um you know while while the big boys figure out your political situation hopefully we still have enough power to like power you know the deployment of AI to the rest of the world.

On the topic of Apple, do you think they can afford to fumble the roll out of Apple intelligence just because they're of their position as you know the core consumer hardware provider or do you have strong take there?

Um yeah, I mean so the lock in of iOS, iMessage, um you know, the the drive, Apple cloud, whatever they call it, um is very very strong, but there's a time limit on this thing. And for one, like I actually tweeted about this recently.

Uh when the iPhone like I'm much more excited for OpenAI to compete with Apple than with Google. Like OpenAI currently is running the Google Playbook. They're looking at like they're reading themselves by like, "Oh, ChachiBT is like the sixth traffic website in the world. We want to go up, right?

" Um, but like Google is doing super well and like actually Google's pretty good. Like we we we want to have a Google in our lives.

Um, but Apple has is really fumbling and they know it and when OpenAI comes up with the OpenAI phone, I think it will be a serious challenge to Apple because uh Open iPhone like the Well, just be smarter. You know it. Yeah. Yeah.

and like everyone would would try it and it will be the first serious challenge to the Apple iPhone since you know Steve Jobs presented it.

That's a great That's a great and people already have like such an extreme willingness to try new AI hardware that when you know the dominant lab comes out with something I think every single person we know is is at least buying it right if it's a thousand bucks sure I'll take it like no no like you know I think it's just like the the people who want to do get serious about hardware um they've just been like the small guys like the the rabbits and the let's just let's just say call it the humans um but like if you get someone with the resources of OpenAI.

I I really want to see if they take a real run at it. I mean, he's talked they have chats with Jon I if they've confirmed that it's in the works. I don't know if like how serious it is. They could still kill it, but I hope to God that they actually challenge Apple and Apple will get a [ __ ] together in in response.

I completely agree. Uh well, yeah, that's a fantastic take. I'm looking forward to trying the OpenAI phone. We'll we'll see it. And there'll probably be an XAI phone, too. I mean, it might be a new new dawn in consumer hardware driven by artificial intelligence. Uh, but thanks for stopping by. This is fantastic.

Well, we'd love to have you back. This is such a fun time. I can't wait for the next one. Yeah. And everyone, go check out the latest podcast he just dropped. Late in space. Yes. Do it. Cheers. Talk to you soon. Bye. Fantastic. Next. Coming in, we got 4Erunner Ventures. Kirsten Green, legend investor.

We uh we reached out actually initially after she dropped her 2025 trend report. Yep. uh which is always fun to uh process. Uh so I want to ask her about that and about a bunch of other stuff. Yeah, the 2025 consumer trend report is out. You can get it at forrunner ventures. com.

Um a deep dive into where consumers stand today and how major shifts are shaping new needs and opportunities all across um uh a ton of different um ton of different sectors. It's a it's a quick it's a quick read. It's uh 200 slides. Uh they really do their homework over there. And uh here she is to break it down for us.

Give us the the 30 minute condensed condensed version of the 200 slide deck. Uh yeah, I mean me and Jordy we're basically live three hours a day. Not a lot of time for

← Back to story