Lambda CEO Stephen Balaban: $500M revenue, NVIDIA still unchallenged, and the coming age of neural software
May 23, 2025 with Stephen Balaban
Key Points
- Lambda has crossed $500M in revenue by operating $1B in deployed GPUs partitioned into 15-minute rental intervals, undercutting the multi-year contracts standard in bare-metal cloud.
- NVIDIA faces no credible challenger; AMD and Google's TPUs have failed to clear the bar of running arbitrary Hugging Face models reliably across training and inference workloads.
- Lambda's revenue is flipping from 80% training to majority inference as models move from development to production, signaling the market is now buying compute to run existing AI rather than build new models.
Summary
Lambda has crossed $500M in top-line revenue, according to CEO Stephen Balaban. The company runs roughly $1B worth of deployed GPUs and has put around $100M into the virtualization software that lets it dynamically partition large clusters — as many as 16,000 GPUs — into slices available on intervals as short as 15 minutes, without requiring the multi-year contracts typical of bare-metal cloud providers.
NVIDIA supremacy, for now
Balaban is unambiguous that nothing competes with NVIDIA today. Customer demand reflects "total and utter NVIDIA supremacy," and his test for any would-be challenger is straightforward: a customer needs to download an arbitrary model from Hugging Face, run it, train it, fine-tune it, buy compute, and then buy again. No alternative has cleared that bar consistently. Google's TPUs and Amazon's Trainium and Inferentia are the furthest along among challengers, but AMD — despite its resources and market clarity — hasn't captured meaningful share. Architectural shifts toward diffusion models and alternatives like Mamba could actually benefit NVIDIA by making fixed-function transformer ASICs less valuable and general-purpose tensor processing more relevant.
Training to inference shift
Lambda's revenue mix has historically run roughly 80/20 training to inference. That's now flipping. New large-scale GPU deals Balaban sees are predominantly inference-driven — consistent with the broader picture of models being used rather than just built. OpenAI is likely at $4–5B in revenue, Anthropic was last publicly reported at $800M but is probably past $1B now given Claude Sonnet 3.7's strength in code generation, and Midjourney is likely in the hundreds of millions. Google is now charging $250/month for its top Gemini tier, rising to $500. Balaban's read: all of that revenue is inference.
Neural software
The more provocative argument Balaban makes concerns where software itself is heading. Code generation is the current frame, but he sees a world beyond it — one where large language models don't generate programs at all, but are the programs. Ask a model to behave like a calculator or a spreadsheet, generate an ASCII interface, and implement the logic internally. He calls this "neural software." Unlike conventional code, it can't have a bug in the traditional sense — only misunderstandings or misprompts. His view is that transformer models will progressively absorb more of program space, and users will interact with them directly rather than with software they produce.
The near-term version — which he thinks arrives in a couple of years — is a function that converts cash into software: spin up 500 variations of a target tool, run them through a taste-maker model that evaluates compiled, computer-use-tested versions, and surface the top five. The point isn't just developer productivity. It collapses the replacement-cost argument for incumbent software entirely.
What survives disruption
Balaban's answer on Salesforce is worth sitting with. The software itself is replaceable; what isn't is the fact that every company installs Salesforce when they hire their first VP of sales, and all their data accumulates there through to S&P 500 membership. The moat has nothing to do with the cost of rewriting the software. Distribution, brand, and data lock-in survive AI-generated code. His implication — framed half-jokingly — is that this shifts power toward the business cofounder who can build those moats, not the engineer who writes the software.
Infrastructure constraints
The binding constraint on growth isn't GPU supply or generated power — it's what Balaban calls "wrapped power": capacity that's already been enclosed in a data-center shell with direct-to-chip liquid cooling. Regulatory hurdles around behind-the-grid power generation are a secondary bottleneck, and he's cautiously optimistic the current administration will ease some of them. Building large contiguous physical spaces is his clearest answer to scaling.
The throughline across Balaban's argument is confidence in the demand curve. Code generation that felt primitive in 2023 now works reliably for single-file programs. He puts near-100% confidence on it reaching full codebase capability within two years — which is why, in his view, every megawatt Lambda builds will find a buyer.