Interview

Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'

May 14, 2026 with Andrew Feldman

Key Points

Cerebras CEO Andrew Feldman argues inference demand will grow by orders of magnitude, making speed the dominant market differentiator as users experience real-time performance.
CUDA lock-in is weaker than believed: Gemini 3 and Anthropic's models were trained on non-Nvidia chips, suggesting the market supports multiple viable architectures.
OpenAI and AWS anchor Cerebras's customer base, with AWS solving enterprise procurement friction and OpenAI validating cutting-edge inference workloads.

Andrew Feldman Cerebras

Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'

Summary

Read full transcript →

Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'

Andrew Feldman co-founded Cerebras Systems on a single provocative premise: GPUs are 100 times better than CPUs for deep learning, but that doesn't make them the right solution. A graphics processing unit was never designed for this workload. Something built from the ground up could be far better.

That pitch landed Benchmark's Eric Vishria as a Series A investor in 2016, a decade before the IPO. The years between were not easy. Feldman describes the 2020–2023 window as a slog — AI was generating interest but almost no real usage. "Everybody was saying, that's cool, look what this model can do... and they went back to whatever they were doing before." The tidal wave of actual AI adoption, in his telling, didn't arrive until roughly 2025.

The roadshow argument

Feldman says he used the IPO roadshow to make three arguments the financial community hadn't fully absorbed.

First, inference demand is going to grow by orders of magnitude. He cites Jensen Huang's claim that inference demand will grow by a million times — a number most investors didn't believe when Huang first said it — as the kind of exponential framing that needed explicit explanation.

Second, GPUs are not the only path. TPUs, Trainium, and Cerebras all represent viable architectures. The market is not a monoculture.

Third, CUDA lock-in is overstated. Gemini 3 was trained on TPUs with no CUDA. Anthropic's models were trained on Trainium with no CUDA. Feldman's point is that some of the most capable models in production today were built entirely outside Nvidia's software ecosystem, and that narrative hadn't reached most institutional investors.

“The demand for inference will grow by a million x — and nobody believed Jensen when he said it. The notion that CUDA is this grand lock-in is overplayed. Gemini 3 was trained on TPUs with no CUDA. Anthropic's models were trained on Trainium with no CUDA. How big is the market for slow search? Zero. Fast inference is going to be all of the market.”
— Andrew Feldman

Fast inference as the whole market

Feldman's commercial thesis centers on the experience of speed. He used GPT-4.1's Spark and Codex integrations — both running on Cerebras — to illustrate what real-time inference feels like in practice, and argues that once users experience it, they won't go back. His analogy is Netflix: the company didn't get better at mailing DVDs; faster internet turned it into a studio. Nobody today would pay to downgrade from broadband to dial-up. "Fast inference is gonna be all of the market."

Cerebras connects to customer environments via standard 100-gigabit ethernet, nothing proprietary, and is already deployed alongside Nvidia and AMD GPUs in mixed infrastructure. Feldman frames this as a feature — Cerebras as a specialist inference accelerator inside a broader confederacy of models, not a replacement for everything else.

Customers and distribution

The two anchor relationships Feldman describes are OpenAI and AWS. OpenAI is the cutting-edge inference use case. AWS solves the enterprise distribution problem: rather than navigating large-company procurement cycles with master purchase agreements "the size of a bible," customers can buy Cerebras through AWS and count it against their existing cloud commitment.

On large models, Feldman argues scale actually favors Cerebras's architecture. A 10 trillion parameter model is hard to serve for everyone, but the wafer-scale chip's large on-chip compute means fewer chip-to-chip communication bottlenecks. Multiple systems can be chained in pipeline to handle multi-trillion parameter inference in ways he says are more tractable than stacking GPUs with limited on-chip compute and heavy reliance on off-chip memory.

Internal transformation

Feldman says engineers at Cerebras have gone from using approximately zero AI tokens to $10,000 worth of tokens per month each, with pull request rates changing at a corresponding pace. He expects AI to reshape entire departments in the next nine to eighteen months — HR, training, finance, and recruiting, where "writing LinkedIn scripts" is already being displaced.

Space data centers

On the SpaceX-Google launch deal reported in the Wall Street Journal, Feldman sees a structural advantage for Cerebras in any eventual space compute environment: larger chips require fewer chip-to-chip communication events, which matters enormously in the latency-constrained, power-limited conditions of orbital infrastructure. But he places commercial space data centers eight to twelve years away, not three to five. The work is worth doing now precisely so the timeline doesn't stretch to twenty-five years, but it's not a near-term business line.

The immediate bet is simpler: keep building AI computers faster than anyone else, take on large amounts of data center capacity, and let the demand Jensen Huang described — the million-times growth in inference — come to them.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Cerebras raises $1.1B as Andrew Feldman bets faster AI inference changes what's possible — not just speed

Sep 30, 2025

Cerebras CEO Andrew Feldman: NVIDIA spent $20B to buy the #2 inference player — validating our market

Jan 12, 2026

Cerebras IPO goes 20x oversubscribed — boosting share count and price range ahead of Nasdaq debut

May 11, 2026