Dylan Patel launches InferenceMax: the first independent AI hardware benchmark running daily across NVIDIA, AMD, and more
Oct 10, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Dylan Patel
that he couldn't even bring himself to smoke a cigarette. You think that's what happened? Maybe. No, I don't know. But thank you, Rune, for doing the fact check. Obviously, you are correct. You know story more and we always appreciate the truth zone.
Um, well, we have our next guest, Dylan Patel from Semi analysis with some massive news. Dylan, how are you doing? That is a cinematic shot. Is this AI or something? Where are we? That's called This is a orura farming. I'm literally in America, bro. That's amazing. Out of the back of the truck. You're the bald eagle.
You're the bald eagle. He's not a China hawk. He's a bald eagle. This is proof of work. You're You're out. You're out at the cluster. You're cluster maxing truck. your inference maxing, your podcast maxing. Thank you so much for taking the time. Uh give us your breakdown quickly on uh on the launch today.
Inference max launch yesterday. Uh I went on your Zoom call at 10:30. I was laying in my bed listening to you. It was very interesting, but I'd love to hear you kind of break it down first. Wait, first Jack wants a uh can we get a sho? Are you Do you have cowboy boots on or or what? We need the full fit check.
The full fit check. The full fit check. Oh, okay. Looking good. But cowboy boots. Cowboy boots next time. Okay. Anyway, sorry. Uh, please give us the give us the high level. So, I'm at a This is a fire uh station behind me by the way, just so you you know. Uh, but um in Tennessee.
Anyways, uh yesterday we launched Inference Max, which is a humongous release for us.
it is running it is a benchmark that's doing cost per million tokens and uh how many you know cost per uh tokens per megawatt across all major AI infrastructure AMD Nvidia all the newest GPUs um and and and on all the latest models GPT open source llama deepseeek etc right and so the reason why this is so important is you know throughout the industry people are always like oh our chips are great at cost this way our chips are more efficient that way well it turns out to actually measure inference, you have, you know, a variety of different uh metrics.
Like you can always just cherrypick something, right? It's some vendor saying some BS. Y and so that there ends up being a cherry-picking and then it's also on some super hyper optimized software stack that's not real, right? It works for that one specific cherrypicked use case. But guess what?
When I'm running inference at a major company, sometimes I have big requests, sometimes I have small requests, sometimes I'm outputting a ton, sometimes it's like an agentic workflow, sometimes it's this model, sometimes it's that model.
So, what really matters is the real software that people are running and it's on, you know, the latest drivers, the latest um open source, you know, PyTorch version, latest VLM, latest SGLAN, all these things matter because at the end of the day, software changes every day, performance changes every day, models change all the time, right?
and and to actually get, hey, there's trillions of dollars of infrastructure investments being made over the next few years. How do you actually measure what's the best uh hardware? What's the what's the most efficient hardware? What's it cost? And that's that's what we're aiming to do with Inference Max.
And so we're supported by um Nvidia, AMD, Microsoft, OpenAI, Oracle, Corewave, Dell, Super Micro, HPE, and all sorts of vendors that I I can't remember off the top of my head. Well, you're not making any money on this, right?
It's uh it's all uh open source, but there was a ton of capital that came together, a ton of people that did put up money. Uh what's the scale and the scope of the project? Yeah, so Semi analysis has multiple engineers that I'm paying full-time.
So I'm I'm losing like, you know, a million dollars a year on this or a bit more obviously because engineers are expensive. Um but on top of that, it's it's you know, the the vendors are contributing and the cloud companies are contributing tens of millions of dollars of GPUs. Um, and there's no congratulations.
That's that's fantastic news. So, so you know the the the thing is I'm I'm not necessarily like sure how I'm going to make money on it, but I am aura farming as I am with this background, right?
All that matters is you know what's what what what you know how do we deploy AI efficiently across the globe and you know perhaps by aura farming in this way we'll figure out how to you know people will buy our other stuff right is the hope.
Um, you know, not exactly sure, but this needed to exist and there was no way for it to exist unless we did it. Yeah. What's What's your life been like the last few weeks?
How often are billionaires calling you asking you specifically for financial advice saying, "Hey, I'm thinking of putting, you know, a billion into this one. What do you think? Should I do it? Should I Do you find yourself having to push back and say like the second trillion dollars of debt to flow into this? " Yeah.
What do you think? Um, you know, the the the crazy thing over the last few weeks is that, you know, companies that you would have never expected to need debt are in the debt markets, right? You mentioned debt.
Um, you know, people like Meta and Oracle, you know, who three years ago you'd have been like, these are the most profitable companies on the planet. Well, they're in the market for debt cuz they're they're building.
Um, as far as like how often are people in the DMs or calling me, you know, that's what the company does. We provide services around this. Uh, so, you know, I'd like to say the company and business is taking off like a rocket.
And so, you know, the whole point is inference max aura will increase the aura of like other people like you know uh you know doing this. But I will say it's just like we've hit terminal velocity, right? It feels like we're building a [ __ ] like like Matrioska brain.
Um you know like I don't I don't know what people are trying to go. That's great. Uh what's the biggest uh debunk that's come out of the results of inference max? Is there some narrative out there on the timeline or in the AI community that you feel like you've kind of you're able with this data to turn things around?
Yeah. So, I mean, there's there's tons of people like, "Oh, AMD is best. Oh, Nvidia's the best. Oh, this is better. That's better. " Um, it turns out like everyone's statements are sort of like, you know, there's got to be a lot more nuance to it.
Um, and so my favorite thing is yesterday I saw a Twitter war between two accounts with like 5,000 followers each. So, these weren't like small accounts, per se, and they were going back and forth posting data from Inference Max saying, "No, you're wrong. You're cherrypicking. No, you're wrong. You're cherrypicking.
" And it's like the reality is it's a little bit more complicated and they're debunking each other. Yeah. Um but I think what's relevant is that, you know, Nvidia is not the only game in town. Um a lot of people uh thought that they were.
um you know between the OpenAI AMD deal uh that happened and then the results that we've shown and we've been working with AMD and Nvidia on this for many many months um it's clear OpenAI I mean Nvidia is definitely ahead right but there's certain use cases where AMD is better right if you're running GPT open source uh that model's exploding in usage then hey guess what actually AMD may be a better uh hardware for on a dollar basis it's not better on a watts basis and you know those are the two things right so why maybe I'm in test is you know there's a lot of watts here that we could put on AI infra um but it's it's it's a challenging sort of uh you know thing is sometimes your capital constraint sometimes your power constraint and what you should do uh maybe you do actually think you know maybe you should deploy AMD right maybe you should deploy Nvidia um the default is Nvidia but actually in many cases it makes sense and the software works the open source software works it's not buggy completely it is if you're training and doing other things but if you're running inference on specific models it works uh what's the biggest risk to the overall buildout?
Is it energy capacity? Like what what's what's top of mind for you uh over the next 12 to 24 months? All these deals have been announced, but a lot of people are asking where is the energy going to come from once you once you start talking about, you know, gigawatt scale clusters. Yeah.
So, it's it's it's not even it's not even uh you know, like you've got all of these like dudes in suits like you, you know, in in their little cushy little offices signing these big checks of of fake money on bank accounts. But the reality is is like, hey, um I can buy the GPUs.
I can get them made and import them from overseas. I can buy uh you know like the sheet metal. I can buy all these different things. But you know what you can't do is there's not enough [ __ ] in cowboy boots in middle America um deploying and building these things, right?
Like it turns out electricians wages are skyrocketing, right? It turns out like plumbing wages are skyrocketing because data centers need liquid cooling and and so like how I think that's the biggest risk is you know where is the skilled labor going to come from in in the West. Interesting.
Um because the West has not built at this scale before. Yeah. Uh chat says that uh Regav in the chat says you're the only 10 IC. So he's having fun that you're out in Tennessee. Uh is there any uh is what is the long-term vision look like?
Is it relevant to think about adding TPU gro like like what is the shape of the of the road map? Are you sharing that yet or how can or is that even relevant? Yeah. So, Inference Max is amazing because it runs every single day on the latest software. But right now, we've only got tens of millions of dollars of GPUs.
You know, we got we got to hit the the hundred million number to actually get everything right. So, so what that means is more models are being supported and we we we've got that in the works. Um we've got uh adding TPUs and tranium. Um this is a real big difficult engineering effort. Google and Amazon are excited.
Um you know, we'll see how long it takes us, but it is a difficult thing, but you know, they've got to put up the capacity. We've got to get the capacity somehow. um and add those those chips. And then if you do that over 99% of the flops are around the world, maybe we add Huawei um maybe we add Groer Cerebrus.
It's really a lot of engineering is going to be required and so that's all on the road map is to add more hardware um quality. It turns out there's a lot of innovations that people are doing on uh model inference beyond just quantization, right? You can do 8bit or you can do four bit.
But let's say everyone's doing 8bit. you can still do certain innovations that make to performance better but quality worse. And so there's these tricks that people are implementing that you know actually it's it's completely unknown to people. And so you know measuring quality as well is really really important.
um and and and you know continuing to mo run it every single day in an automated basis um and continuing to get more people pushed behind it so we can get you know TPUs, traniums, GPUs on as many models as possible um with quality as well measured. Well, that's fantastic Jordy. One more question from my side.
Uh it seems like the debate is heating up around depreciation schedules for GPUs. Like what's your what's your framework on on that front?
uh a lot you know Neocloud wants to say five to six years but maybe that's not realistic how how are you thinking about it so every company uh major company in the world the Googles the Microsofts Amazons etc do six years right that is the industry standard um but that may also be erroneous and the reason why it may be erroneous is because the reason it's it got pushed up from you know four or three years to six years over the last decade was CPU storage um you know that sort of uh was not advancing that fast.
Now we've got AI, we've got it advancing like a rocket ship. Um and it's faster than ever. And so the question is you there's two points on useful life, right? One is does the thing still work in six years? And the other one is is it even useful to run it in 6 years, right?
And these are two very very different questions. Um you know for will it still run in six years? The answer is most likely, but these things are running super fast. GPUs, TPUs, etc. are way less reliable than CPUs and memory. So, it's a very high likelihood that may not work in 6 years.
Whereas CPU servers, they'll actually run like 10 years. It's fine. Um, but may not, you know, GP may not last the full six years, especially the new ones that are super hot, liquid cooled, etc. Right? A lot more complexity, a lot more likelihood it could break down within the six years.
The other side, is it economically useful? Well, if Nvidia is releasing a new GPU that times as fast for 50% more money every year and a half, well then in six years you're at like 20x improvement in performance, right? And it maybe only costs like three times more.
So, so you're like, okay, well, yes, the old GPU still, even if it still works, is it even useful or should I throw it out and in with that power, should I feed the new thing? Right? Um, and so that's the big question is, you know, is it useful to keep using the newest GPU or the old GPU or should you buy the new thing?
You know, as Jensen says, the more you buy, the more you save, right? And so maybe maybe his argument is correct, right? Yeah. And does that present a real risk? You know, a lot of people are levering up and and raising, you know, debt and and against GPUs that and and assuming that that five, sixyear useful life.
How how big of a risk is that is what I'm trying to understand. Um it depends entirely on the company, right? So for example, Oracle is raising debt and they're building out Stargate and all these things, right? You know, they've got this $300 billion deal with OpenAI.
Their biggest risk is not that hey, you know, our depreciation schedule is six years and OpenAI's contracts are five years, right? And they still make money if they they don't they aren't able to last a year, right? Because they've got the contract with OpenAI.
The real challenge is where the hell is opening I going to pay $300 billion, right? Um, you know, I'm a believer. I'm a believer. I think you guys are believers, but a lot of people aren't. Uh, for other folks, it's like, hey, I'm out here deploying GPUs. I'm just putting them out there, right?
Hey, anyone want to rent them? Please rent them. And maybe you only sign a six-month contract, maybe sign a three-year contract. That's where it gets more risky because at the end of the term, I haven't paid off my GPUs. I haven't paid off my debt. Um, where where am I going to uh sell it? Does the price fall?
where does the price end up being in that after after a year? Um, and so we saw that with Hopper GPUs, right? The people who signed the long-term deals initially weren't making as much money because they were selling them at $2.
Whereas, you know, other people were out there like, oh yeah, six months I'll sell it to you for $3. That was amazing money for that first 6 months. And then on renewal, it's like, oh [ __ ] it's only $250. And then on renewal, it's like, oh wait, now I'm selling it for less than $2.
And who knows, as Nvidia's black ball comes out, as Ruben comes out, as AMD's new chips come out, as Google starts selling TPUs, all these things keep driving down the the cost performance and how many tokens you can get per dollar and per watt. So then all of a sudden, is a hopper still worth $2? Is it worth $150?
Is it worth a dollar? For the people that are locked into a 5-year contract, that's one thing. for the people who are, you know, just yelling and don't have a long-term contract, it's very possible that, you know, the GPU works, but it's not able to produce economic value worth actually put into it.
So, that's the big risk. That makes sense. Uh, Oracle sold off earlier this week based on reporting from the information. Any what was your immediate reaction to that uh piece? There was a lot of push back on it. Yeah. So they said that Oracle's margins were low. Um Oracle's margins are not that low.
They're higher than that at that exact point in time. Their reporting is accurate, right? Uh but that what they what they deduced based on the reporting what the numbers they saw were not accurate, right? Which is that Oracle's margins are low for the deals they've signed. That's not accurate.
What's accurate is that Nvidia's GB 200 NVL72 has a lot of issues, right? Um they're mostly being solved and and have been solved, but there are a lot of issues. It's it can be unreliable because of how complicated of a of a thing it is. So so much power uh liquid cooling it's got the back plane.
So there's a lot of difficulties with the hardware because of how complicated and how fast it is uh that are being solved solved already. Um the other one is hey Oracle has to rent these massive data centers before they fill them up with GPUs.
So Oracle's paying all this money for these data centers for Stargate right like in Abalene Texas that aren't necessarily generating revenue yet.
And when your revenue goes from like this to rocket ship up because you've got Stargate, um you know what happens is you've got all this cost right before the revenue comes in, right? You've bought the GPUs. Um you're trying to figure out how to make them to work, you know, because they're a little bit unreliable.
You're replacing things. You're building out the data center, you're renting the data center. All these costs are hitting their books, right? That doesn't necessarily mean that they can uh rent them. I might get be getting kicked out. You're all good. This has been a pleasure.
Uh Dylan, next time you're in Los Angeles, we'd love to have you at the TBP and Ultra Dome in studio. Uh everyone's a huge fan here. Congratulations and thank you so much for stopping by. Yeah, massive launch. Excited to see how it plays out. All right, see you folks. See you. Thank you so much for having me today.
We got to get uh Dylan. Uh Nick, reach out to Dylan. Get his shoe size. We'll get some uh we'll get him some cowboy boots. That sounds great. Good uh use out there. You got something on your mind? There's some big news going on. So So one is um about 30 minutes ago, Trump put out a truth.
He said 100% tariffs on China starting November 1st. What? Um so since then uh I think broadly today um 250 billion has been wiped out from crypto. Okay. Yeah, Bitcoin is down 5%. Oh no. Yeah, we have to check in on our on our retail trader. Our retail trader in resident. Another white pill though. White pill.
Deis just said they did uh last month they did 1. 3 quadrillion tokens on Gemini. Congratulations. That'll that'll fix the global economy. That'll fix the trade. Congratulations. Uh absolutely wild. Um our retail trader and residence is um probably put a hole through the wall by now.
Uh thankfully the markets are closed for the week, but uh my portfolio on public is looking looking rough. It's a rough day, but we're happy to see that. Hey, it's a rough day, but Monday will probably be rougher. Maybe. We'll see. It might might be a white suit day. Everything could get resolved over the weekend.
You never know. We have to figure out uh what our bare market suits are. Maybe maybe suits that are like actually look like bare kind of like fur fur suits. That would be good. Yeah. Uh, you know, in Hollywood they have uh there's there's a certain synthetic tears. It's like uh propyline glycol or something.
I that's what in vapes. That's not it. But there's some sort of uh eyropper that you can put in your eye. They they put in eyes of actors when they have to cry. And so maybe we should just be applying those the entire show so that we're just crying constantly.
Um, well, in some uh much better news, of course, if you want to get away from all the chaos, you can get you can book a wander. You can find your happy place. That's right. Book a wonder with inspiring views, hotel, great amenities, dreamy beds, top tier cleaning, 24/7 concier service. It's a vacation home, but better.
Uh, Joey in the chat says, uh, Fartcoin is down 73%. I don't know. I don't know why. No, no, no. That's got to be a joke. That's impossible. I looked it up. I mean, I'm looking at a chart right now. says today and so maybe it's rallied a bit today.
Yeah, that I mean I guess I can't believe that something like a household name in tech is actually down that much. That's absolutely crazy. Um well uh in in in some more serious news, we have a new partner at TVPN. We're partnered with Gemini, uh Google AI Studio, the fastest way from prompt to production with Gemini.
Um, AI powered coding, ease of use. It's built for everyone. You've been a power user. We're friends with Logan. We're very Logan is a dear friend of the show. Uh, and works around the clock to make this. He's done a fantastic job over there. Supercharger your creativity and productivity chat to start writing,