Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'
Key Points
- Cerebras CEO Andrew Feldman argues inference demand will grow by orders of magnitude, making speed the dominant market differentiator as users experience real-time performance.
- CUDA lock-in is weaker than believed: Gemini 3 and Anthropic's models were trained on non-Nvidia chips, suggesting the market supports multiple viable architectures.
- OpenAI and AWS anchor Cerebras's customer base, with AWS solving enterprise procurement friction and OpenAI validating cutting-edge inference workloads.
Summary
Read full transcript →Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'
Andrew Feldman co-founded Cerebras Systems on a single provocative premise: GPUs are 100 times better than CPUs for deep learning, but that doesn't make them the right solution. A graphics processing unit was never designed for this workload. Something built from the ground up could be far better.
That pitch landed Benchmark's Eric Vishria as a Series A investor in 2016, a decade before the IPO. The years between were not easy. Feldman describes the 2020–2023 window as a slog — AI was generating interest but almost no real usage. "Everybody was saying, that's cool, look what this model can do... and they went back to whatever they were doing before." The tidal wave of actual AI adoption, in his telling, didn't arrive until roughly 2025.
The roadshow argument
Feldman says he used the IPO roadshow to make three arguments the financial community hadn't fully absorbed.
First, inference demand is going to grow by orders of magnitude. He cites Jensen Huang's claim that inference demand will grow by a million times — a number most investors didn't believe when Huang first said it — as the kind of exponential framing that needed explicit explanation.
Second, GPUs are not the only path. TPUs, Trainium, and Cerebras all represent viable architectures. The market is not a monoculture.
Third, CUDA lock-in is overstated. Gemini 3 was trained on TPUs with no CUDA. Anthropic's models were trained on Trainium with no CUDA. Feldman's point is that some of the most capable models in production today were built entirely outside Nvidia's software ecosystem, and that narrative hadn't reached most institutional investors.
“The demand for inference will grow by a million x — and nobody believed Jensen when he said it. The notion that CUDA is this grand lock-in is overplayed. Gemini 3 was trained on TPUs with no CUDA. Anthropic's models were trained on Trainium with no CUDA. How big is the market for slow search? Zero. Fast inference is going to be all of the market.”
Fast inference as the whole market
Feldman's commercial thesis centers on the experience of speed. He used GPT-4.1's Spark and Codex integrations — both running on Cerebras — to illustrate what real-time inference feels like in practice, and argues that once users experience it, they won't go back. His analogy is Netflix: the company didn't get better at mailing DVDs; faster internet turned it into a studio. Nobody today would pay to downgrade from broadband to dial-up. "Fast inference is gonna be all of the market."
Cerebras connects to customer environments via standard 100-gigabit ethernet, nothing proprietary, and is already deployed alongside Nvidia and AMD GPUs in mixed infrastructure. Feldman frames this as a feature — Cerebras as a specialist inference accelerator inside a broader confederacy of models, not a replacement for everything else.
Customers and distribution
The two anchor relationships Feldman describes are OpenAI and AWS. OpenAI is the cutting-edge inference use case. AWS solves the enterprise distribution problem: rather than navigating large-company procurement cycles with master purchase agreements "the size of a bible," customers can buy Cerebras through AWS and count it against their existing cloud commitment.
On large models, Feldman argues scale actually favors Cerebras's architecture. A 10 trillion parameter model is hard to serve for everyone, but the wafer-scale chip's large on-chip compute means fewer chip-to-chip communication bottlenecks. Multiple systems can be chained in pipeline to handle multi-trillion parameter inference in ways he says are more tractable than stacking GPUs with limited on-chip compute and heavy reliance on off-chip memory.
Internal transformation
Feldman says engineers at Cerebras have gone from using approximately zero AI tokens to $10,000 worth of tokens per month each, with pull request rates changing at a corresponding pace. He expects AI to reshape entire departments in the next nine to eighteen months — HR, training, finance, and recruiting, where "writing LinkedIn scripts" is already being displaced.
Space data centers
On the SpaceX-Google launch deal reported in the Wall Street Journal, Feldman sees a structural advantage for Cerebras in any eventual space compute environment: larger chips require fewer chip-to-chip communication events, which matters enormously in the latency-constrained, power-limited conditions of orbital infrastructure. But he places commercial space data centers eight to twelve years away, not three to five. The work is worth doing now precisely so the timeline doesn't stretch to twenty-five years, but it's not a near-term business line.
The immediate bet is simpler: keep building AI computers faster than anyone else, take on large amounts of data center capacity, and let the demand Jensen Huang described — the million-times growth in inference — come to them.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.