Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'

May 14, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

Speaker 2: Let's bring him in to the TBPN Altro. Andrew, great to see you again.

Speaker 1: Looking sharp.

Speaker 9: Feeling sharp. How are guys doing?

Speaker 2: Feeling. Amazing. Congratulations. How has the day been? I would love to get just your reactions from the day. It seemed like there were a lot of people there. Take us through your your emotions today.

Speaker 9: Well, know, this was better than than we'd hoped for. Think a chance to celebrate. We we did bring a lot of people from the company and we brought families.

Speaker 2: Yeah.

Speaker 9: And to to to to share with the team. We we brought everybody who'd been at the company for longer than nine years and their families. We you know when you do a start up the family is a is a meaningful part. It takes patience from them and and a a great deal of it. And so they came and we celebrated it. It was really an extraordinary day. We we opened up, you know, we did we priced at $1.85, we opened up $3.50 and we settled at about $3.20. What an extraordinary thing. We're just so proud.

Speaker 2: Yeah. Take us through some of the the history of Cerebras. Has it been a straight shot? Has it been an overnight success? How do you characterize it? What were the darkest moments? What were the highlights? What are the good old days to you? What does that mean?

Speaker 9: Well, look, think in the hardware business if anybody tells you it's a straight shot, you can call BS. I just don't think that's the way our business works. I think the first time you build a chip with a new architecture, it's a little more than a prototype, a little more than a proof of concept. The second chip, you iron out your your challenges and you begin to show it to customers in in mass. Third one often that that really takes off. And and so it's a long long road in in innovative hardware designs. And so, you know, were founded in 2016. We're we're more than ten years old. We sought to solve problems that that that others that's right. Overnight success. Thank you. Oh, exactly. Like like a decade. Like Yeah. I was pounds lighter and weight

Speaker 1: As overnight successes are, you know.

Speaker 9: Yeah. That that's right. I mean they're just overnight because most people sort of weren't paying attention. But we tried to solve some problems that other people thought were impossible. As we showed you last time, you know, we tried to build a chip that was the size of a dinner plate.

Speaker 2: Yeah.

Speaker 9: And everybody told us it was impossible and the truth is for a while it was.

Speaker 2: Mhmm.

Speaker 9: And you know, we we didn't solve it until August of of two thousand and nineteen. We built this extraordinary chip. We were faster than everybody and absolutely nobody cared. Nobody. AI wasn't ready and it was still sort of a novelty. And nobody cares about how fast you are when it's a novelty. But but starting with with GPT and in 2025, the models got so darn smart they became useful. Mhmm. And suddenly everybody wanted to use AI and you use it with inference and and business was rolling.

Speaker 1: Yeah. What were those early rounds like? I'm thinking the benchmark round, CO2, bunch you know, Eclipse, a bunch of others.

Speaker 9: You you know, we we had the advantage of the founding team had been together our last company that had paid pretty well for the the venture capitalists and the team and so we we we had some wind in our sails when we went out and raised money. It's not like today where we're we're four guys in the word lab and you're raising it a billion priests for for year That that's not us, but we went out, we we made eight calls, we got eight term sheets, we chose benchmark and foundation and eclipse.

Speaker 2: That's amazing.

Speaker 9: And we got going, you know, less than a year later

Speaker 1: I was expecting I was expecting you to say like, yeah, I mean it was it was a slog, you know.

Speaker 6: Were so Other rounds were a slog. Yeah. Other rounds were a slog.

Speaker 2: Yeah.

Speaker 9: At the beginning, not so. You know, Thomas LaFonte at Cotu came in shortly thereafter and we did a round with them. I think the truth is between about 2020 and 2023 it was it was much harder. Yeah. AI was sort of in this situation where everybody was saying, that's cool, look what this model can do, look how big it is, but it wasn't being used anywhere. Yeah. Right. Nobody was using it. Yeah. They were pointing at it, they were saying, wouldn't this be nice and they went back to whatever they were doing before. And and it wasn't until really sort of 2025 when the models got good and you just saw this tidal wave of people using AI and demand for AI compute. And that that's been exceptional. It's just been an amazing thing to ride.

Speaker 2: Yeah. You yeah. You mentioned like if you have four guys and your your company name ends with lab, can raise a billion dollars. There's a little bit of that going on in the market with just like chips, semiconductors, AI. There's not that much that needs to be explained, but what were the key ideas or thesis that you needed to explain in the roadshow to investors that wanted to go a layer deep a layer deeper than just AI chips?

Speaker 9: Yeah. I I I think there were there's the first the market size and dynamic. And I I think Jensen said some time ago on on Brad Gerson's podcast that that the demand for for inference will grow by a million x.

Speaker 2: Yeah.

Speaker 9: And nobody believed him.

Speaker 2: Yeah.

Speaker 9: And you know, at the same time you saw Sam Altman, you know, displaying real vision and going out and trying to lock up huge amounts of compute Yeah. And memory and data center and power because he saw it too. Yeah. And I I think trying to share what that means, what an exponential demand means and that we're still so early and yet the the demand for AI compute is is overwhelming.

Speaker 2: Mhmm.

Speaker 9: I I think sharing that was interesting and and I I think helpful in educating the the financial community. The other thing is that that there are lots of ways to do this. The the GPU isn't the only way. You've got a TPU, you've got Tranium, you've got us. There are lots of different ways to to to build a solution here. And finally that may maybe the the notion that CUDA is sort of this grand lock in is overplayed. And that, you know, the the Gemini three which is an excellent model was trained on TPUs with no CUDA. That Anthropix models were trained on Tranium with no CUDA. I mean that that low and behold some of the best models, of the most interesting things are being done without CUDA and that that that lock in might be overplayed. And I I think these three factors were really important in in educating the the financial community.

Speaker 1: Going forward, how do you think how do you and the team think about sort of calling your shot and sort of trying to predict where and how inference demand will look in 2030 and beyond versus like working closely with the labs that now have product lines with billions of dollars of revenue and their own roadmaps that you can work Yeah.

Speaker 9: Like the babe, I'm going to point out to left field and and and just say, wait, this is where it's going, baby.

Speaker 2: I love it.

Speaker 9: Love No. I I don't think that's way it works. I I I think we're calling our shots every day by making big investments in data center capacity and collaborating with with the the leading visionaries in the field in in working not just with with OpenAI to to service sort of the cutting edge and and deliver their extraordinary models, but also with AWS to make sure that that we can get access to the the largest enterprise customers and instead of having to to work with these enterprise customers, procurement aid sort of organizations who who provide master purchase agreements that are are the size of a bible. You know, you can say look why don't you buy us through through through AWS and it'll count against your against your annual commitment. And so I I think those are are really important ideas and and ways we we get access to the market. And then we're we're taking huge amounts of data center capacity. Mhmm. And so that's the the other bet we're making.

Speaker 2: Yeah. Makes a lot of sense. How how do you think the year will play out in terms of just broader consumer awareness of what fast inference feels like? I had really a magical moment using Cerebras in GPT 5.3 Spark and Codex. And even outside of coding tasks, just talking to the model and having it respond instantly was sort of it felt like a new breakthrough or a new paradigm. And I feel like this hasn't fully diffused but it it also feels like when it does there will be potentially like entirely new ways of working, entirely new paradigms that might emerge. How are you thinking about actually diffusing the technology?

Speaker 9: We we think that's exactly right.

Speaker 2: Mhmm.

Speaker 9: And we think that that the experience of engaging with with a real time AI

Speaker 2: Yeah.

Speaker 9: Will will encourage people to do more things, to stay longer, to work on harder problems. And to invent new things. I mean, you remember, you know, when Netflix started, they delivered DVDs and envelopes.

Speaker 2: Yeah.

Speaker 9: Right? And when the internet got fast, they they became a movie studio.

Speaker 2: Yeah.

Speaker 9: And they didn't get better at DVD delivery, they became something completely different that had never been in existence before. A movie studio that delivered directly to your home. Yeah. I I think that's exactly what's going to happen. And you can just sit back and you can ask yourself, I mean, how big is the market for for slow search? Zero. How big is the market for dial up internet? I mean, much would I have to pay you to swap out broadband at home and bring in dial up? I'm not Right? 1,000 a month? 1,500 a month? 2,000 a month? Mean, no way. Mean, it just wouldn't be worth it. Yep. And so it the the community is gonna engage with inference in the same way and that fast inference is gonna be all of the market.

Speaker 2: Yeah. So you you make the chips. I believe you also make cooling infrastructure as well, cooling units. Is there are there other products on the road map that you think will be required to roll out and scale Cerebras over the next couple of years?

Speaker 9: No. I don't think so. I think right now we we build the the chip Yep. And the system and the system includes it's about the size of a dorm room fridge. There you put two of them in a standard data center rack.

Speaker 2: Yeah.

Speaker 9: And the cooling infrastructure is built into the system.

Speaker 2: Sure.

Speaker 9: And I I think that's where we want to focus.

Speaker 2: Yeah.

Speaker 9: We we want to be measured on our ability to build AI computers that are faster than anybody else.

Speaker 2: Yeah. How are you thinking about scaling on chip memory? It it feels like there's some there's some concern about, well, what if the models go to 10,000,000,000,000 parameters? If it gets too big? How are you thinking about that challenge or maybe it's an opportunity?

Speaker 9: It is an opportunity. I I think a 10,000,000,000,000 parameter model is hard for everybody. It's actually easier for us.

Speaker 2: Okay.

Speaker 9: Right? There's a reason we're not a 10,000,000,000,000, it's because it's really hard and expensive to serve for everybody.

Speaker 2: Mhmm.

Speaker 9: I think one of the things that we've been able to do for the larger models is to tie together a bunch of these systems in parallel. Mhmm. And run them as a pipeline.

Speaker 4: And

Speaker 9: that way we can train and do inference on trillion, multi trillion parameter models in ways that I think are are much more intuitive than than on GPUs that have much smaller compute. They have off chip memory, but their problem is the compute. They don't have enough compute per chip. Mhmm.

Speaker 2: And then how how are you talking to to to customers about potentially bringing Cerebras in not as a full replacement to their entire semiconductor supply chain or stack, but as a as a complement to everything else that they're running. Because I have this vision of, like, the next generation of AI agents. You get this genius model, but it needs to use a small model over here, an open source model over there, a super fast model for a certain thing if it's looping through

Speaker 1: some task. Way you hire you have a superstar employee

Speaker 2: Yeah.

Speaker 1: You don't necessarily want them doing every single task themselves. It's like, yeah, you should be able to delegate.

Speaker 2: Yeah. Delegation. How are you thinking about that?

Speaker 9: Yeah. I I think that is sort of a notion of a confederacy of models, right? That that there's a collection of different models and one of the the the things we thought about early on was how to interoperate in that environment.

Speaker 3: Okay.

Speaker 9: And we we connect in via standard 100 gigabit ethernet, nothing fancy, nothing proprietary. We we are deployed in in many places where they've got GPUs from from Nvidia or GPUs from AMD. They've got x 86 compute from Dell Yeah. Or HP. And so that's not a problem at all. We're we're eager for those environments.

Speaker 1: Yeah. How what what do you think the company would look like today if you guys had had access to today's frontier models when you started the company? Like are you feeling like how and how do you think about just like the speed up in, you know, at the company today due to how good the models have gotten?

Speaker 9: We we we use frontier models every day in coding, in running our g and a. I I think if you start a company today, you build a very different organization. I I think there are whole departments that look different in in in the next nine to to eighteen months. I think much of what HR does, much of what training does is solved by some form of AI. I think a lot of the work in finance, right, closing the books, a bunch of what they do is checking and those are all done by agents. Think what it is to be selling or or doing recruiting, those change. I think for a long time what recruiting was was hunting through or writing scripts for LinkedIn.

Speaker 2: Mhmm.

Speaker 9: I think that changes substantially.

Speaker 1: Mhmm.

Speaker 9: And so when we look out, we we see sort of fundamental changes. The obvious ones of course are, you know a year ago engineers were using approximately zero tokens and and now they're using you know $10,000 worth of tokens a month. And the the rate of change and the rate of new PR requests, new pull requests is just extraordinary. And so AI is having fundamental changes. Obviously, it usually starts in in Silicon Valley and sort of works in waves to to other areas, but that's what we're seeing right now.

Speaker 2: Since the last time we talked, there's been a ton of movement in the space data center market, a lot of energy. Just yesterday, SpaceX and Google eyed a launch deal in the Wall Street Journal. Have has any of your thinking changed? Like, what is your current thesis on space data centers and how it might fit into your business plan over the next decade even?

Speaker 9: Well, one of the hardest things in the space data center is communicating across chips

Speaker 2: Yeah.

Speaker 9: From one chip to the next, and we solve that. Right?

Speaker 2: Yeah. I mean,

Speaker 9: one of the great parts about a big chip is that you have to communicate from one chip to to the next less frequently.

Speaker 2: Yeah.

Speaker 9: It's a huge advantage for us in space. Mhmm. I I think that this is an idea like self driving where the last 10% takes 80% of the time. Sure. Right?

Speaker 4: Yeah.

Speaker 9: And that we're not three or five years away, we're eight to twelve years away. Yeah. That doesn't mean we shouldn't be working on it or thinking about it or making progress to it because if you don't do that, it's twenty five years away. Yeah. But I I I don't see data centers in space in the next three or four years.

Speaker 2: Yeah. And arguably, you've solved the key problems that you would be asked to solve and so you'll be ready if demand shows up.

Speaker 6: You're ready.

Speaker 2: But there's not that much for you to do individually to advance that.

Speaker 9: That's exactly right. Yeah. That's exactly

Speaker 12: right.

Speaker 2: Well, we're hoping for it. It'd be exciting. But plenty plenty work to do here on the ground. It's

Speaker 1: cool. Congratulations to you and the whole team on this incredible honored that you would spend time with us.

Speaker 2: We really appreciate it.

Speaker 1: Short day for the company. Yeah.

Speaker 2: Let me hit the goggles.

Speaker 1: To watch your progress. And I look forward to your next appearance and enjoy the rest of the evening.

Speaker 2: Enjoy the rest of the evening. We'll talk to you soon.

Speaker 9: Thank you guys. It's time for a cocktail. Be well.

Speaker 2: Fantastic. Enjoy. You deserve it. Goodbye.

Speaker 12: What a

Speaker 1: I fantastic love that He's like, I'm Babe Ruth. Yeah. I just point. He's like, no.

Speaker 2: I'm not gonna do that. It's way more complicated. I've been working on this for a decade. Yeah. What a what a fantastic story. What a fantastic performance. I'm very excited that we're bringing in Eric Vishria from Benchmark who was in that series a that Andrew Feldman just mentioned. So we will talk to him about that in just a minute. We're gonna bring him into the waiting room. But there are some other posts that we can talk about in the meantime. One, someone is using runway amount to to create let me see this. A full hurricane inside a TV studio. I wanna watch this clip. And it's one minute and we will see how convincing is this. Are you gonna be turning off People think about hurricanes. In order to watch this.

Speaker 3: But wind is only the beginning. The real danger is when the storm starts moving. Storm builds, ordinary things stop feeling ordinary. Roof panels,

Speaker 11: prelim.

Speaker 2: You think audio also? Like fully AI generated. Because the typical workflow for this is the the the new the host would stand on a green screen or LED volume, and then all of these effects would be added in post or or live through like a traditional visual effects pipeline. This feels fully synthetic. I think that you'll probably use some sort of hybrid approach, but the the ability to prompt something like this on the fly for a small news organization that maybe doesn't have the budget for a huge VFX team. You're just gonna see a lot more VFX like this. You're gonna see stuff all over the place. There are so many small news channels, local news stations that just don't have the the, you know, access to digital domain or some huge visual visual effects house. So looks pretty good.

Speaker 1: Before our next Yes. Guest Dylan Field has a quick update.

Speaker 2: What'd say?

Speaker 1: Have their q one results. He says quick update. Not dead. And putting up some insane numbers. 46% year over year revenue growth accelerating for the second straight quarter. It's They're raising 2026

Speaker 2: revenue 6.8% today, up 8.6% after hours. Congratulations to Intel. It

← Back to story