Lambda CEO Stephen Balaban: $500M revenue, NVIDIA still unchallenged, and the coming age of neural software

May 23, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Stephen Balaban

still has issues, but I challenged myself to push this in one night. I thought this was very cool. I've been joking about we need someone to vibe code a new Google reader since that was such a popular product.

It should just be out there and uh and it's already starting to happen with replacements for Pocket and uh it seems like Stephen's in the studio. So, let's bring him in and talk about Lambda. How you doing? Welcome to the stream. Hi guys. How you doing? Welcome. Great. How are you? Great to have you on.

It's good to see you. Looking fantastic, by the way. Thank you for wearing a suit. Thank you. Naturally, um I guess when in Rome, but I that said, this is how you know everyone in Silicon Valley should dress. I agree. I agree. Um, are you are you in Silicon Valley right now?

Can you give us a little drinking a monster introduction? Drinking a monster. Let's go. So on brand. It's amazing. So So I'm actually uh in my hometown of of of Charlotte, Vermont. It's where I grew up. Yeah. And uh it's where it's where I live. Great. And just just got back from uh landed this morning from SF.

Uh how how would you describe the business these days? cuz I know that uh well, first we got to talk about the AI scene in Vermont. Oh, yeah. Picking up, you know. Uh yeah, I'd say that there's there's there's there's like one or two one or two companies.

There's a lot of uh uh I'd say uh independent-minded founders out here. Uh but generally speaking, I'd say um it's a more of a farming farming. Yeah. My little my little brother who's not in tech went to college in Vermont and just stayed. I' I've tried to get him to come back forever.

He's committed committed to to the great state of Vermont, but uh well, let's introduce him to Masayoshi Son and maybe he gets started with uh you know building an AI company out there. It could happen. It could happen. It's not off the table. Um so yeah uh brief introduction. How would you introduce the company?

How are you describing uh the different offerings and kind of challenges right now? Yeah. So, Lambda is an AI cloud. It's a essentially really massive GPU cluster that all kinds of startups and hyperscalers use. And we've both sort of taken invested a ton of work.

It's basically around a billion dollars worth of GPUs that we've deployed. Wow. Um, and we've invested, you know, at this point probably around hund00 million into the virtualization software that allows us to essentially take that massive cluster of GPUs.

So you can imagine taking a cluster let's say 16,000 GPUs and you can sort of dynamically partition it and hand out little slices to people uh you know on a as as little as a 15minute uh basis and the the it's it's it's quite different from let's say a bare metal cloud where you need to sign a three-year contract.

Um, we've we've basically implemented all this virtualization software that lets you hotel on and off of it. Um, while still, you know, maintaining high-speed Infiniban intercon. You did the meme. You another billion dollars for Jensen. Uh, how many billion dollars for Jensen?

Is it is is it actually primarily Nvidia chips or uh are there other uh GPU providers that are competitive at this point? I've been following George Hotz's journey with AMD. Um, is is anything competitive with with Nvidia right now from your perspective?

I don't think that there's anything that's competitive with Nvidia right now. Yeah. Um, what you just see in the customer demand is in, you know, total and utter Nvidia supremacy. Yeah. And I think that this is going to continue on uh for some time. The the test I always like to say is you've got to get a customer.

They've got to be able to download kind of any arbitrary model off of hugging face, run that model, train the model, fine-tune the model, and then they have to buy from you, and then buy again, right? So they that's that's that's the test, I think, um, for any chip manufacturer.

Um, and it's really hard to get a stable software stack. Do you like the term Neocloud? Is that appropriate? You know, I think it's I think it's fine. I don't I don't really have any opinion on on it.

I we kind of look at ourselves as we've always described Lambda as what would the cloud look like if it was reinvented from the ground up for AI. You know, training and inferencing large language models, training and inferencing large scale neural networks. Yeah.

And I think that Neocloud's an appropriate description for for that concept. What is the balance between training and inference right now and how has that evolved over the last um couple years? Yeah.

So, you know, we um we're we're for for for the overall business, we're we're just we're just a bit over uh $500 million of topline revenue. And I'd say that um essentially, congratulations. Thank you. Thank you.

Um essentially historically that's been majority training and then um sort of minority interest you know maybe 80 8020 for for uh you know training to inference. Mhm. And um now we've started to see that kind of switch over.

Um, I would generally say that when I see any sort of net new deal in the in the space for large scale GPU capacity, it tends to be more inference driven these days. Sure, it's great. We're actually using the models.

It's not just training bigger models with no with no uh I mean we saw that at Google IO Sundar Pachai pulled up the chart of tokens generated and it's just completely up and to the right. It's completely up to the right.

I mean you training is always going to be something where you do it once and the whole point of training is that you want to amvertise that over as many tokens as possible, right? Because that's just your fixed cost that you're you're wanting to spread across as many generations as possible.

And so that's the point the point of training is inference. Um yeah, we're there. I always kind of look at this as like a I always like to sort AI companies by revenue and just go, okay, well, who's at the top? Probably like OpenAI today, you know, four billion, five billion.

It's it's a little bit hard to you know to to to exactly guess but then you know next one down you kind of look um anthropic 800 million last published uh mostly developer API uh but probably fast growing I would imagine they're probably in the billions now because Sonnet 37 was really good for code generation.

Um Google doesn't seem to be charging um from what I can tell uh charging me $250 a month now just to pop up to 500. So, they're getting my money. What are they charging for? Uh, so I am now on the Gemini Ultra plan and uh and I have 2. 5 Pro preview and then I also have access to VO3, but it's very limited.

I they can only generate two or three video clips per day. It's heavily throttled, but it's a $250 a month plan right now and it's going to jump up to $500 a month in a few months.

So, I imagine that across Gemini Pro subscriptions, they're probably going to grow that pretty quickly just because I'm seeing the number of go to zero to billion dollars in 100% in a couple weeks, right?

You know, and uh so when you kind of do that breakdown, what what do what do we see in terms of like total topline revenue? It's all inference, right? I mean, you know, all of that is inference demand, whether it's image or chat, GBT or or or video generation.

And then you know mid Journeys probably you know in the hundreds of millions of dollars uh revenue run rate and so uh there's some substantial businesses that are being built uh right now and yeah what are some kind of narative what what are some narrative violations things that you feel like the broader ecosystem is getting wrong right now given you guys have unique insight into actual usage and activity.

Yeah. Um okay. So the the the first one is just obviously I think if you look back there was that um that Sequoia uh Thinkboy piece um that was published on like uh where's the GP you know where's the AI revenue what's you know they sort of did that it it there there's a lot wrong with that analysis.

Um, so I don't need to go into that, but I'd say like the general pessimism of like, oh, it's a bubble, it's a bubble, it's a bubble. I I just remember, you know, I started Lambda in 2012. 2013 was the year that like Mark Zuckerberg went to Ner and hired hired um Yan Lun to to start Facebook AI research at the time.

And every I just remember everybody in the field kind of at the time looking around, oh, is is it a bubble? you know, Google's just bought Deep Mind and uh you know, Facebook's buying Yan Lun, so it must be a bubble because cuz Mark Zuckerberg just came to Nurups.

But um I think that I think that in general everybody underestimates how just what exponential growth looks like from, you know, it it always looks the same in the T the same, you know, wherever you are, it always looks looks essentially the same. um and how good the code generation has gotten over the last 90 days.

Right? If you were to flip back before January, um uh you know, state-of-the-art for code generation was uh 40 and then 0 01 hadn't even I think been exactly released yet.

um with 01 04 claude sonnet 35 37 all this stuff the code generation is getting so good that we I I think I can say here which is in a couple of years you're just going to have a function that goes from cash into software and that is going to completely change the way that businesses operate because you know you're going to spin up 500 different versions of the piece of software that you're searching for.

You're going to be able to do this sort of like high throughput search through software space and it's going to spit out a bunch of things and it's going to have like a maybe a taste maker model that just rates it based off of, you know, the computer use compiled version of that software and say, "Hey, well, these are all the source code bases for all 500 pieces of software.

These are the top five. I'd recommend you launch this one. Go for it. " And I think that's going to really change the way that the world operates on in in technology. So, should you learn to code if you're a teenager? Yeah, absolutely. I I think that you you still you still need to learn how to code.

Um I I I don't see I I think it's it's really hard to in the intermediate period, you're still going to need to to learn how to code. Sure. Um I think that you're just going to see just much higher leverage teams in the world, right?

you might always need to think like a programmer and so learning to code is part of that process. That makes sense. Um what what about um the evolution of chips and semis? Before we dive into that, Sure.

I I feel like there's this general thesis that uh sort of we we've talked about this on the show, this idea of like uh we call it like Jurro dreams of sushi software, right? This like really craft more art than science. More Yeah.

art and science like super intentionally built software uh that that is just you know super super thoughtful and you can see this in companies that like maybe go after a category like CRM where it's like okay Salesforce dominates CRM but they come in and they build this sort of really beautiful really thoughtful experience and and typically uh if the teams are good enough they'll end up doing doing well um I I think there's this sense that like that class of software is like safe from uh completely AI generated software.

But in the in the scenario you laid out where you generate 500 different variations of a potential tool that you'd use and it sort of automatically ranks it uh based on some taste driven benchmark. Is is it possible that that class of software is is at extreme risk to disruption as well? Yeah, certainly.

I think anything that's just software is is at extreme risk. You know, you you look at a business like Salesforce, right? It's almost as if I mean, how important is the software? How important is the brand? How important is the distribution model?

And and I think that when you look, you know, the bigger the company gets, the more important the the ladder becomes, you know, the distribution model, the the brand of the company, how deeply embedded it is inside of the day-to-day life of the customer that's using it.

And it's sort of like, well, what is the value of Salesforce? Is it the software? No, not at all.

The value of Salesforce is the fact that every single company in the world, the first thing they they do when they hire a VP of sales is like make sure they have a Salesforce implementation and then it sticks with the company until their S&P 500 component and then all the company's data is in Salesforce, right?

Does that have anything to do with the software that Salesforce has written? Is that the mo anything does is does the mode have anything to do with the replacement cost of developing the software crowd app? Yeah. No. Right.

So, it's kind of interesting because like stuff like software will be solved, but then stuff like how do you build moat and how do you build a business won't be solved. So, it's sort of like maybe just, you know, this is we're in the we're in the world for just uh the the business co-founder dominates.

The idea the era of the ideas guy, the era of the ideas guy is upon us. Yeah, that'll be really exciting. Okay. I I I I want to talk more specifically about the the path to this future of like turning money into software and creating value that way. Um, what is more important?

Just bigger training runs, knocking down higher MMLU numbers, benchmarks, higher IQ points versus distilling models, faster inference, cheaper inference, um, baking some of these models down into silicon, what we're seeing with etched and and putting the transformer architecture on a chip.

Um, they seem like two different vectors. Every time a new model comes out, my reaction is always like, well, this is good enough. I just want it to be faster. And so midjourney v6 or whatever, I'm usually just like, yeah, I just love this to be instantaneous as soon as I type the words just generating in milliseconds.

At the same time, the labs seem to be iterating towards bigger and better models and they and and they they have a mentality of like job's not finished. Um, but what's your take on on those on the trade-offs there?

Well, um, what we're seeing with the advent of reasoning models as well as like the the models that basically will do reasoning and then sort of like retrain the model to not do any reasoning but sort of baking in the reasoning. Mhm.

Um, is that this the the amount of compute that you you know the the the performance improves as a function of how much runtime compute you do. Yeah.

And so if you can make a faster model uh and you can reason faster about it, you could you can make an argument that that that actually um might perform might perform better um in some circumstances. I think that we're we're going to see all dimensions of that space explored by a variety of companies.

You know, there will be people out there at the edge building the biggest frontier models. there will be be people quantizing and distilling those bottles down to something that runs locally on your phone. Um, that was actually kind of one of the things I did before this iteration of of of where Lambda is.

It was sort of trained comnets that ran locally on the on the iPhone. Oh wow. Uh, and this was like 2013 and it it you kind of see the same thing. There's use cases for that. It's super useful.

you could have privacy preserving image recognition on your phone um but it's not going to be the same quality as like something that goes back to a data center.

I think actually if there's one narrative violation just going back to you know you said this like world of software generation a lot of people are kind of stuck in this like okay AI is is generating software and I've got this entire I think thesis on where the future where we're going you won't need any software at all and that the neural network is going to completely replace all the software and so let me walk you through this the idea is that instead Instead of generating a program, let's say a calculator or an Excel spreadsheet, just go to chat GPT and say, "Hi, please behave like a program.

Please behave like this calculator or behave like this spreadsheet. Generate an asky user interface for me and I want you to um essentially just respond, you know, implement the logic of that program in your mind. " Um, and that is what I call neural software. and it's really squish.

You know, normal software is really brittle, right? If you make a typo, you you leave a keyword out, you miss a semicolon, it's not going to compile. This type of neural software, it's not really possible to have a bug. It's really more just that you've have a misunderstanding or you've misprompted it or something.

And I think that that's where all of this is going.

It's not code generation but it's going to be your your large language models are going to be you know sort of take over more and more of program space and uh you you will be largely interacting with these sort of transformer models or next you know token you know prediction models generally and they will be the software that you interact with.

What about diffusion models? We saw Google bring diffusion to text models. Uh I was seeing something like 900 tokens per second. Uh they generated I saw a demo where someone generated a full calendar application all the code for the calendar application in 3 seconds. It was 3,000 tokens or uh something like that.

That feels like um a you know an algorithm from image generation that now we're seeing in the text world. Simultaneously we're seeing images in chat GPT maybe do something more transformer or tokenbased and so these lines are blurring. Can you give us any insight into what's happening there?

Is that exciting or is this kind of just in the experimental phase? Well, I mean this is like if it's whether it's exciting or not is going to kind of answer the question you had earlier, which is how successful our bakedin transformer asex going to be, right?

Because sort of like as the space the underlying space changes. Yep. um then every one of those sort of as6 now becomes a lot less valuable and you kind of have to go back to the the more general uh tensor processing uh that you see inside of tensor cores and you see inside of the architecture of like things like TPUs.

Yeah. and away from, you know, really specific things that have to do with like um the KV cache and different transformer specific architectural things you might want to put into um an ASIC.

And so I think that it's it's interesting to see you've got diffusion models, you've got things like Mamba um and and where where where there's there's alternatives to the transformer that have um what what's called basically linear complexity in terms of the uh um the the amount of memory um that that you're that you need for um uh the context length growing.

um which is better than the quadratic complexity you see inside of normal um transformer models and I think there's there's going to continue to be a lot of innovation in the model architecture space and that will probably benefit Nvidia a lot. Got it.

Are there any other side projects in semiconductors that are exciting to you? uh huge wafer scale computing like Sarah Brris. We talked a little bit about baking things down onto uh AS6. Um we saw that path play out with Bitcoin with the FPGAAS and then the ASIC kind of Bitcoin mining.

Um there's other approaches and I'm sure Nvidia is not asleep at the wheel. Jensen Wong's in founder mode. He's he's he's aware of the boom. He's aware of the demand. I'm sure people are asking him how can we run diffusion models faster? How can we run transformer models faster?

Um, what are you expecting on the semiconductor side over the next like few years? Well, I it's pretty clear that I I'd say the the front runners for competing with Nvidia, which are all very far behind Nvidia.

Uh, but the ones that are sort of I think the farthest along is probably I would say today is Google and to some extent Amazon with Tranium and Inferentia. Yeah.

um that space is always evolving fast but I think it's it's it's kind of a little bit telling that uh AMD hasn't with all the resource with all the resources with all the u sort of clarity on what you're supposed to build for the market uh hasn't been able to kind of capture enough market share Dylan Patel is going to turn it around.

He's he's going to write just one more report and AMD is going to be back. I'm bullish. That's all it takes. I love it. Um but uh in terms of can you give us a little perspective on on obviously if I'm if I'm doing a ton of inference on Lambda, I'm probably in the CUDA ecosystem, probably in in Nvidia land.

If I want to take that over to Amazon or Google with TPUs, how much of a barrier really is that? Can you kind of explain because at the same time we have these we have these incredible code generation models. It feels like putting an AI agent on rewrite this CUDA for TPU that seems like the easiest thing to do.

It seems like a a problem perfectly tailor made for AI agents to just sit there and write boring translation code. It's not even it's not even feature design, right?

Wasn't that in the TL where someone was making a joke about the uh an the anthropic safety uh you snitching on the user and someone said uh port this uh pietorch code over to JS and then it says you know searching calling FBI calling FBI do not do this you're going to you're going to pull the rug on the entire economy if you do that preserving the the the underlying chips that it's running on to make so important to America.

So, so I think that I think that okay, the the the the stuff that this code generation is really really good at today is sort of what I would call within the realm of what you know oneshotting a basic program.

Oneshotting sort of a one-page program or it could be much longer than one page but it's sort of like single file. Yeah.

or a function or or you know a class something like that not necessarily something maybe I'm just behind on it a little bit and I'm not you know doing what the kids are doing or something with with with uh sort of an AI IDE but I just kind of like when I'm when I'm vibe coding I will just tell either chat GPT or Gemini or Claude I'll go do this in one page like have it be a one thing I want copy and paste this into my my thing and run it.

Um, and it's it's quite good at that. Now, when you talk about like going through an entire codebase, um, fixing the compilation errors because there will be like subtle Yeah. uh, subtle bugs that get introduced, it's not quite there yet. Yeah.

It's just the problem is that I now am at like 100% confidence that it is going to be there uh in just a couple of years and that that's kind of why why I know that every sort of megawatt that that we build and every GPU that we deploy is just going to get met with demand on the other side because just two years out like if you just look two years ago 2023 code generation was primitive Yeah, primitive.

It didn't work. Now today, it really works for more simple programs. I think two years from now, it's just going to it's going to make you feel sick when you look at it. So, what are the big bottlenecks you uh you foresee between like uh energy, water, land? Uh are you going to be building a data center in space?

What what do you think in the future looks like for you? Um, so I think that the the the bottleneck is definitely what I call like wrapped power. So the I think there's plenty of generated power. Um, right now it's just not wrapped up in a data center shell. It's not in a powered shell.

It's not in a facility that has direct to chip uh liquid cooling integrated into it. And so it's sort of like that wrapped up power that's like ready for the current generation. and next generation of chips.

Um, I think that there's there's definitely some like regulatory bottlenecks or I would just say regulatory hurdles that can be removed and I think there's a lot of hope that this administration is going to start to remove those.

It's like whether it's like looking at sort of the way that we run utilities where um you kind of have to in some cases become um an unregulated utility and put like let's say a you know build a data center power plant which is to say behind the grid or not attached to the grid power generation station next to a data center and not every state's going to allow that and I think that there's there's probably a lot to do in the regulatory side to unleash the free market and let people build.

And so I think that I I think there's some hope there. And the the other thing is just really I think building uh large contiguous spaces is like is pretty clearly the the answer in my opinion. Well, good luck with that. Uh I'm glad I don't have to deal with it. Logistics is a real pain.

It sounds like a lot of work, but you've been doing it for 14, 13 years. So, you got another decade in you at least. So, good luck to you. We'd love to have you back on. This is a fantastic Come back on again soon. Thanks, guys. Super fun. Love what love what you do. Long time.

This is I didn't clock you as a monster guy, too. We got to go deeper on energy drinks. Yeah, there's so much to talk about. Lots to talk about, but this is fantastic. Thank you so much for coming on. We'll talk. Great chatting. Cheers. Uh, quickly let me tell you about numeral sales tax on autopilot.

Spend less than five minutes a sales tax compliance. Should we sing sales tax AGI? Sales tax a uh go get uh go get numeral. Head over to numeral hq. com. How did you sleep last night, John? Um, oh, terrible. I woke up at 4 a. m. It was a disaster.

I'm I'm I know I'm gonna get cooked, but you don't have to be cooked because you can go to eightleep. com. Use code TBPN. I got a 76. It was brutal. 99. I can't I can't get 99 back 99s. I don't know what I'm doing wrong. And then also, we got to tell the folks about public.

com inspire me to grind harder investing for those who take it seriously. They got multiasset investing industryleading yields. They're trusted by Let's give it up for multiasset investing. Yep. And uh and their partnership with Aston Martin, which we might have more information on soon, but stay tuned. I cannot wait.

We got Sam Lesson coming in the Temple of Technology very quickly. Um, uh, do you want to do some timeline while we wait for him? Is he here? You know, I love time. Uh, Dwar Patel dropped a banger on Claude 4 day.

He sat down with Trenton Bricken and Chelto Douglas um, talking about Claude 4, how far reinforcement learning can scale. I haven't had a chance to listen to the whole thing. I listened to about half of it kind of in and out while I was trying to sleep last night.

Um, but uh I mean just a very cool vibe of like three people having a conversation right on the day when you want to know this stuff talking all about uh this fascinating metaphor for for AI safety because it's just been controversial.

Of course, there's like some drama around some random anthropic news, but the cool thing that I liked was um they said um basically if you want to tell the AGI what to do and how to behave in the best interest of humanity, uh you could give it specific rules, but it might actually be better to say, imagine there's an envelope AGI uh and in that envelope is what I want you to do, all the rules I want you to follow.

you can't access this envelope, but you have to behave in accordance to what you imagine to be in this envelope. And so the ATM just like I have to