Cerebras CEO Andrew Feldman: NVIDIA spent $20B to buy the #2 inference player — validating our market

Jan 12, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

engineering team. And up next, we have Sarah Brus in the Reream waiting room. Andrew Feldman is the co-founder and CEO, and we will have him join us in the TVP Ultradome. Andrew, how are you?

He's back.

Good. How are you guys doing?

We're doing great. And we're thrilled to not just have five minutes today. We have a the last last bot. So, thank you for taking a sh giving us a second chance. There aren't many second chances in life, but we appreciate you uh having uh taking one with us. Uh but anyway, how are you doing? Uh how's how's your new new year going? Um I'd love to just hear the state of the business and the level of optimism going into 2026. Look,

I think it's an extraordinary time to be in. Well, first thank you for for uh inviting me back. I I appreciate it. It's always good to talk to you guys.

Thanks.

Um look, it's an extraordinary time to be in AI hardware.

I I think uh we have seen exceptional interest. We've seen validation in the market. We've been telling people for uh years that fast inference would be a separate category. Uh and so important was this new category that that Nvidia spent $20 billion to buy the number two player in it.

And so um uh what a what an exciting time

to to see. I mean you were just talking about the guys at Cognition, big customer of ours and and what a great product, blisteringly fast, smart as can be, great coding tools. Um it it is uh what we're seeing may maybe way to think about it is this is that that we had a GPT moment in 2023.

Yeah.

Where people first realized that that a AI was interesting.

Yeah.

And in the second half of 25 people demonstrated it was useful.

Yeah.

Right. And that that is exploding. They're finding different uses for it. These aren't demos. These are production use cases and they're demanding vast amounts of of AI compute.

Yeah.

Underneath that.

Yeah. The fast inference thing is is so it's so obvious that it's become a meme. I'm sure you've seen these where uh people will be uh you know joking about well I'm coding and while I wait for my coding agent to come back to me I I open Tik Tok and I scroll vertical videos. And so I saw one guy made a made a script that as soon as he sends his prompt, it automatically opens them and then it closes them as soon as it responds so that he doesn't get sucked into a rabbit hole. But [laughter] it's like it's brain rot, but it it exposes a real a real behavior, which is that there's a lot of waiting involved in programming these days.

How cool is it that other people are are are making Tik Tok videos about your value proposition?

Exactly. It's like how cool how often does that happen that an entire ecosystem's creativity

is being brought to bear on how shitty the competition is to use.

Yeah.

And what a what a cool thing. Yeah.

and since we have more time, I'd love to go back uh and actually get more of your full story, more of the story of the company and and sort of take us back in time to the initial idea, the first things that you did, all of that. So, uh, if you could sort of, uh, you know, reintroduce yourself and the company, I think that that would be a really good level setting experience.

Great. I'm Andrew. I'm one of the founders and with Sean and JP, Michael, and Gary, we we founded Cerebrus in uh, early 2016.

What we saw on the horizon then was driving. [laughter]

There's still going to be some soundboard here. The overnight success is real.

The overnight success,

it really has been, right? I mean people people were skeptical at v various amounts of times and I think people are finally getting it now.

Yeah. I I think that when you build real things

Yeah. right? You you don't get the same sort of rate of growth that you have in a in a viral software, right? I mean, when you actually build physical things, there's components, there's factory, there's real manufacturing work that needs to happen. [snorts] And especially when you do deep tech

and what we what we set out to do, what we saw on the horizon was a new workload that AI wasn't the same. It it it presented fundamentally different challenges to uh to to the computer and that if we were able to build a new design,

we would not be a little bit faster, not one or two or three or five times faster, that we could get to 10, 20, 30, 50x faster

on this work. And the way to do it was to solve a problem that had been unsolved in the computer industry for 75 years. And that was how to build a really big chip.

You know, most chips are the size of postage stamps. And our chip is is the size of a dinner plate,

right?

So good.

And and so um now it it it took

Yeah.

an enormous amount of work to build.

Yeah. What what walk me through exactly like you you incorporate the company and then you're doing design and you're calling a manufacturer and saying, "I got this crazy idea." And then I imagine like a year years go by until the first one comes off the line. You can actually test it. Like what were those early days like?

They they were unbelievably challenging. I think when you do [laughter]

when when you do really hard work uh your first one is like that first pancake. You know, it's [clears throat] sort of messed up. Make the pancake and the the pan isn't hot enough and the pancake is all nasty and and it's like that. And we [snorts] had an idea that that we could solve this class of problems that had broken every other effort to build a big chip.

Mhm.

And uh we we spoke to backers that had vision and believed in us.

Mhm.

And we put together one of the the truly worldclass chip teams in the industry. There only about half a dozen world-class chip chip teams and we have one of them. Mhm.

And then we uh we laid out a plan and we flew out to TSMC and said, "We we think we can do um uh something with your process

that nobody else has ever been able to do."

And they listened and and to their enormous credit, they listened and said, "Shit, that's a good idea." And in the meeting, they green lit it.

That's amazing.

It was amazing. What was their what what what risks were they identifying? What was their push back? I mean, even if they say, "Yeah, we'll take your money and we'll build this thing." What are they cautioning you about at that moment?

Well, also Yeah. Also, I feel like if you're a founder, you get a lot of nos. Sometimes it's with customers, sometimes it's with uh investors or talent, but if you're building a chip and TSMC says no, like you do you do you just go back to the do you go back to the drawing board? like you you got to give them that carries like quite a bit more weight than just an investor saying I'm I'm I'm passing.

I mean there's TSMC is the best in the business. And so uh um we we heard no when you do when you're interested in pioneering work, when you're interested in fearless engineering, you're going to hear no a lot because everybody's afraid of the problem that you're trying to attack. And when everybody's failed at a problem, then those who are medium, lack vision, are always like, "It'll never work. It can't be done. Others failed."

Yeah.

And it it takes a special type of person to say, "Well, just cuz others fail doesn't mean we need to fail. The world has changed. The tools are better. We have better insight. We can use uh architectural techniques to avoid some of the things that broke previous efforts. And so what we did is we went out, we studied the previous failures and we came to believe that that we could work around everything that had broken previous efforts and uh we brought this back to our investors. They're like, "Wow, this could be really big." And uh we we tuned it. We tuned it. We flew out to TSMC and they agreed. Um I I think even when you do pioneering work uh you encounter problems that nobody thought of.

Yep.

Right. I like to tell the story that I mean imagine before Everest was summited. You get to base camp and there's a a team there having tea that just failed to summit. And you talk to them and you go, "Hey guys," and they say, "There's this part about halfway up that's really really hard at being.

No one's ever gotten past it."

Right? and you go up and you go up and you come down and you're having tea with the same guys and you lean forward and say that wasn't the hard part, [laughter]

right? And when you do things that nobody else had ever done, you encounter things that nobody else has ever encountered and uh we had to invent materials, we had to invent packaging techniques. The problem that caused, for example, the the B200 to be 18 months late was a problem of the coefficient of thermal expansion that we'd solved in 2017. We saw it, we knew it was coming for them, and we'd solved it. The problem that that caused the dojo project, the Tesla to fail. We'd predicted we'd solved it in 2018. Um, we we we had already encountered these pro these problems. We'd found ways around them. And uh you know what fun is it to be an engineer if if you're doing the same stuff everybody else has done already? I mean, it's only fun when you're doing

What were the What were the early kind of key customers that uh because it's great if you can have a partner like TSMC that says, "Hey, you guys are on to something. We want to we want to be a part of this."

2016 there's no LLM, right? So,

no LLM

who's doing AI that's that might be a potential customer.

That's right. So, early customers included, you know, a visionary group at Galaxos Smith Klein. Not who you think of as likely to bet on us. Yeah. truly visionary group and and one of the I think truly great minds in in the application of AI to pharma is a guy named Kim Branson uh who runs AI for for them. Um the the military and the national labs they were accustomed to understanding what could be done with blisteringly fast hardware.

Interesting. And so our our first our serial number 01 went to Argon National Labs, part of the DOE

infrastructure. And we have projects today with Argon and with Sandia that and the and the I guess Department of War it's now called that, uh, measure in the hundreds of millions of dollars.

Mhm. And so, uh, they were early customers and, uh, they were willing to take early machines that that were a little rough around the edges and that was our first generation. Then we delivered the second generation a couple years later and it they got better and better. Then we delivered a third generation and then it took off and uh, we had the rise of of inference as a meaningful workload. Uh, and suddenly performance was everything. talk about that early critique that I heard that part of what was going to be difficult about this was the fact that when you're operating on the wafer scale, if there's one defect, you throw the whole chip out as opposed to with smaller chips. If there's a defect over here, well, I still get, you know, 80% of the chips.

No, that that was that was the received wisdom. The received wisdom, remember how this works. works. I mean, imagine a a a wafer is like a a big circle. Imagine it's a a cookie sheet your mother rolled out into a circle. She takes a handful of M&M's and throws it up in the air and they land throughout this. We'll call those the flaws, [laughter]

right? They're randomly distributed.

And say you can't have a cookie with M&M's. Your your mom goes like this and does a cookie cut through the entire thing. And if there's an M&M in it, she has to throw away the cookie.

Yeah.

Okay. This is exactly how it works.

Mhm. Um, now historically, the bigger her cookie cutter, the higher the probability she'd hit an M&M

and the more good silicon around it, good cookie dough would need to be thrown away.

Mhm.

This was one of the problems everybody said could never be resolved. And we solved it in 15 months with $12 million. It wasn't even a heart.

That's incredible. Wow.

Everybody did. everybody. And it worked like this. It's there is another way to solve that problem and that's the way memory has solved it.

Mhm.

Memory has lots of identical tiles. They're called bit cells.

Mhm.

And in the array of bit cells on a chip. They have redundant rows and columns.

Mhm.

And if there's a flaw, they map it out and use one of the redundant ones.

Interesting.

And that's it.

Yeah. They're built to withstand flaws, not avoid them.

Okay?

And so we looked at memory and said, "Nobody's ever done this in compute, but if we built a computer architecture with a million identical tiles

and we took 5% of them and we held them aside for redundancy, we could withstand almost all failure patterns."

Mhm.

And so we we thought about the problem differently.

Yeah. And it it that wasn't it was top of everybody's mind. They've just been taught that that big chips have lower yield. They they don't really understand the alternatives.

In the same way when Google first built their big data centers, they they said, "We don't want servers that are redundant. If they fail, we'll route around them.

We don't want to pay the extra cost of making them high reliability. If they fail, we'll shut them down. will get new ones. We'll route around it.

So, does that mean that between from chip to chip there might be a slight difference in the number of flops or power that can come out of the chip? Is that is that like a real thing?

Not that it matters practically, but I'm just wondering.

We have to invent a a way to communicate across that little bit of cookie dough between the two cookie cuts that your mother makes, right? She she puts the cookie here and she puts the cookie cutter here for her Christmas cookie and there's a tiny little bit of dough.

Yeah.

Right. And usually what she does is she lifts all that bit of dough up and rolls it and makes more cookies later.

But in the chip world that's called the scribe line.

Okay.

And they run a laser across it and that's how they dice the chips. That's how they cut them.

Yeah.

They obliterate that that tiny little bit of distance. We had to invent a technique to run communication across those And we had to use the tools that were already being operated at at TSMC.

Yeah.

And so, you know, we we benefited from the fact that we had this extraordinary chip expertise, not not just how to write logic to make the chip work, but the back-end design, what we call the physical design and timing. We knew EDA tools, and we knew manufacturing of chips. And so we were able to to think very differently than most companies.

How does intellectual property play into this category? In so many tech startups, they say, you know, don't worry about patents. Uh worry about your network effect. This feels very different. What's your IP strategy broadly?

Deep technology companies have to worry about technology.

Yeah.

Um you know, [laughter] the statement of the century.

You've been you've been talking to SAS guys a lot. [laughter] Right. Um they they worry about virality and they worry about network effects. Um

we we worry about the fact that we can build things nobody else can build.

Yeah.

And that that is a combination of of protecting your IP with patents with trade secrets

uh wi with uh segmenting your manufacturing so your manufacturers can't see

Oh, interesting.

Uh uh the interaction between steps. It's using all the tools in all the wisdom in in your sort of in your toolbox to defend your invention.

And sometimes patenting is not the right. Right. When you patent, you have to disclose.

Yeah.

Right. And if you have to disclose something that is then very difficult to tell if somebody else used, you might want to think about not patenting it.

Yeah. And so, uh, generally you want to patent things that that you can then measure if someone stole.

Mhm.

And so, we have a very aggressive patent strategy, but we also have a very aggressive trade secret strategy and a collection of other tools that we use to defend our inventions.

Talk to me about space data centers. It seems like uh if it does work, if the heating question is solved, um, you probably don't want a server rack with a ton of chips in space. that seems like more to manage more networking uh wafer scale compute in space could be a logical extension of that. Are you excited about that? Have you dug into it? Has it been something that's been on your

road? I was digging into it this weekend with with some friends. I think the following I I think one of the uh real weaknesses

of [clears throat] today's GPUs is by being little tiny chips, it has put a lot of pressure on how you tie them together. Mhm.

Right. And this is why Nvidia bought Melanox.

Yeah.

Why did it make sense? Because they knew that the individual chip would be far less powerful than a collection of chips. And if you're going to bring a collection of chips to a problem, you have to tie them together. Now, that problem is made more complicated in space,

right? You would like bigger blocks up in space. Um I think uh there are a lot of hard problems yet to be solved with data centers in space. I think uh we ought to be working on them but I I don't think it's something that you're going to see in production in the 3 to 5 year time frame.

Sure. What about uh on the topic of cooling? Uh we were just talking to Jeremy from semi analysis about how uh Meta changed their uh data center design from this H structure that was air cooled, very efficient, took them two years. They couldn't do water cooling. Uh now they're doing water cooling in their new data centers, the tents. Um what what have you learned about uh various ways to cool? What's special about your product specifically with regard to cooling and and energy management?

Look, this is not a complicated problem. Um, I love it.

In in

I'm sure. Yeah. Yeah. You want to employ me? You want me to handle it? It's not complicated, right?

I got it.

I can do it.

In Buffalo,

Yeah.

at a at a Bills game, there always three idiots who have their shirt off, right? It's 10 below and they've got their shirt off and Right. They've been drinking hard, but they're not dead. Yeah.

Why aren't they dead?

Because if you're if you're in the water

and the water's 45° and you're there for 8 minutes, you're dead.

Why is that? It's because water's really good at sucking heat off

off heat sources.

Yeah.

Right. We we say in the technology, the thermal density of water is really high. It it rips heat off you.

Whereas the air doesn't do that. So they can stand there with their shirt off and not be dead.

Mhm.

It's the exact same thing when you're cooling a chip. That water has an ability to pull heat off a heat source that is vastly better than air.

Mhm.

So you have two choices. you you can blow cold air over your chip and you don't get the same cooling effect as if you ran cold water against the back of your chip or against the back of a cold plate. And so it is more efficient by an order of magnitude to use liquid water in particular is great and low cost to pull heat off the back of chips.

And we were among the first to do it. Google started doing it with their TPUs in in 2017. Um, it had been done previously in the supercomput world exclusively and now you know the new we do it been doing it for years.

Gamers I remember you build a gaming gaming rig, you get liquid cooling if you want that extra juice.

Yeah,

that's exactly right. Gamers have been doing it forever because they want to run those GPUs hot totally.

And we It's Facebook was they they learned a lesson and they jumped on it.

Well, they were optimizing for one thing which was this energy coefficient and now they're optimizing for speed and new scale and so they have a different set of optimization parameters and they adjusted their strategy. What else in the supply chain have you had to find partners for or develop inhouse? I know you have experience actually building servers, but uh how much else changes when I'm racking your product versus an NVIDIA product from the energy that's outside to the land to the building to the we talked about the cooling. What else is different? How do you solve those to actually stand up whole data centers?

Right. So, we we designed to fit in the exact same footprint as as an NVIDIA

rack 72.

There you go. Right. And so, uh, we are underneath that envelope. We will fit in any any data center designed for for that.

Um, you know, there are fewer little boxes to stuff in because we have a bigger box, but we fit in the standard racks.

Yeah.

You just roll them in. They're exactly the same as you bought for your Dell servers or your for your uh super micro servers

um or your your Arista switches. These are we fit in exactly the same. we use the same power. We we require the same cooling that that all high-end uh

uh AI chips now use. And so uh we we were able to uh we we were able to to sort of fit in the envelope that they carved out.

Yeah.

To make it really easy for data center operators and owners to roll us in. And then on the demand side are our customers who want to use your chips asking, "Hey, we'd love for you to rack these with the hyperscalers or set up a deep relationship with a bunch of NeoClouds or they want to buy and operate themselves or they want you to buy to build the data center and just offer basically like an API for them or

all of them. All of the above. We have an API service. It's ripping right now. We have uh customers who buy the hardware and ask us to operate and manage it for them in their facilities. We have customers who uh uh buy the hardware and they operate it, manage it in their facilities. We have customers that

uh the the whole gamut. Uh you can buy our hardware, you can rent it by the the week, the month, the token uh from various clouds. So it's um the the the customers have sort of consolidated around either they have uh demand to buy hardware and operate it on one side or uh they're sort of under 30 and have grown up in the cloud world and uh don't want to think about hardware. They want to think about tokens and then they pay by the token. How uh what what use cases outside of the coding agents you mentioned uh uh cognition working with you? That one seems super logical in the sense that people are waiting for these agents to complete their really long work uh rollouts, really long like tons of tokens generated uh to to build a whole app or build a new feature. Uh where else are you excited about the potential of not just AI this year but faster AI and faster inference having an impact that might be tangible to someone like a software developer or even a consumer.

I I think uh some of the use cases we've seen in pharma are extraordinarily interesting. [clears throat] I I think

so that that that feels weird to me because if I'm developing a drug it's going to take me three months to get mouse data. It doesn't feel like a couple minutes matters. What's going on in pharma? I I I think it it is exactly because the process is so long and painful and expensive that if you can rip a year out of an 18 process where the whole patent is only good for 27 years,

you've made a huge difference.

Interesting.

And so you can run more experiments in less time.

Interesting.

And the probability that uh once you run these experiments, you have to go to the wet lab which is much slower. you the the probability that that that the one you select out of the simulation out of the AI is a good one and makes sense to go to the wet lab and take to the next level increases dramatically.

Yeah,

I think the the act of being a researcher, right, is really one that AI is particularly well suited to help. You want to scan the literature. You want to make sure that that whether it's in Chinese or French or English, that if there's a publication about the gene you're studying, that you're on top of it. You you you want to be able to scan the literature.

Yeah.

And

basically every question is a deep research report. And so you you want to go if you can go from half an hour to two minutes, that's a huge speed up in your workflow.

That's right. And so what you're speeding up is the researcher.

Yeah, that makes sense.

And you're increasing the number of questions they can ask per unit time.

Yep. Yeah.

That is uh a very very exciting application.

Yeah. Jordy

uh out of your sweet spot, but how do you expect the memory and CPU market to evolve uh this year?

I think uh first memory has historically been an extraordinarily cyclical market, right? I mean, it it has had painful troughs

uh and and crazy high prices both. Um, and I I think right now what you're seeing is a tremendous amount of demand for HBM, which is a form of DRAM.

Mhm.

Um, I think uh those fabs aren't easy to build. So, additional capacity isn't easy to bring up.

Mhm.

And so, you've had this steep upswing of GPUs which are heavily dependent on uh this HBM. And uh as a result all the capacity for for DRAM has sort of moved towards HBM. That's made traditional DRAM and servers more expensive. When it gets more expensive, the hyperscalers panic. Panic may be the wrong word, but their response is to place orders for for the full year,

which causes the Dell and and the the the server builders to go, "Holy crap, I need to buy more. Yeah. Right. And you just watch this uh sort of crescendo of uh of price exploding, right? You just see this huge movement in prices as everybody worries that they're not going to get it. So, they place all their orders for next year early. So, it looks like the year is going to be way bigger than it actually is going to be. And you that's what's happening right now.

How are you thinking about the the financial management of the business? You raised the 1.1 billion series G uh last year. Are you thinking of building the company for a really long time in the private markets, taking it public?

Thank you for sound. That's perfect.

Yeah, we probably hit the gun properly.

I appreciate that.

Uh but but where do you see the company going? How do you want to operate? What makes sense in 2026? Yeah, unlike some of our our competitors, we we we sought to to run a a business that could be measured on traditional financial metrics.

We we like

I love just taking taking shots [laughter] on the way into that that answer.

Look, I mean, I I I think we we we do we we seek to to run a a business that

people can look at and understand. Mhm.

And uh you know, our our gross margins are are are we're better than all our startup competitors. We uh we're growing faster and we're bigger.

And so I I think that's really um

Let's go. Give him the fog horn. [laughter]

Oh, I I I You guys got to tell me like in in a pre-briefing what the various sounds [music] mean so I know what

they're all power.

This is This is the heavy tanker ship. You'll you'll know if it's the right

the heavy tankership is backing up the truck for big business. We love it.

I think um we raised the money because uh we had a ton of opportunity to continue to invest in the business to produce extraordinary growth. I mean that's the right reason to raise money, right? You raise money because the opportunities in front of you

are large and many and maybe have a time uh element to them. So attacking them quickly is of value.

Yeah. And so that that's that's why everybody I mean if you go out and raise money from other people that that's why you should do it is because your opportunity set is uh ripe for for pursuit. Um and so that's what we did. I I think uh we we will deploy it in expanding expanding manufacturing um which is a uh real value in the acquisition of additional data center capacity. um a collection of other things, expansion internationally. These are all things that that the business is asking for right now.

Let's hear it for opportunities that are ripe for pursuit. I like

and traditional pursuits and traditional financial metrics. Um those how can

when in the past has it been worth a clap to say that you like positive gross margins that that you'd like to you'd be surp You'd be surprised. We've had we've had some people we we have people come on. They're trying to bait people by saying, "I don't care about gross margin. I don't [laughter] care about

Yeah, they do. They do. We care here and we thank you for caring as well." Um

uh what what can people expect u out of big, you know, any any big announcements in the pipeline? How can people work specifics? But

yeah, I I'm interested.

I think you can go to surbur.ai and [clears throat] you can try our our you can see how fast it is yourself, right? I I think we the the beautiful thing about the cloud is that it's uh dead dead simple to demo and to try and you we've got a free tier. Jump on it, play around. Um ask a GLM model, which is a great coding model. Just ask it to make a video game for you.

All right. And you will see in 3 seconds it'll make Space Invader for you.

That that will be real code. Yeah.

Right. Ask it to to to play pool to to make a two play player pool game. or if you're a physics nut, a a threeball interaction according to laws of physics. I mean, just ask it to do interesting things and you will see uh right away the the joy how fun it is to engage with an AI that is truly interactive.

That's amazing.

And I I I think that's the best marketing you you if you go there and you enjoy it, um send us a note. We'll we'll follow up. But

play with the eye. I mean, screamingly fun.

Just use it.

Well, thank you so much for taking the time to come chat with us. We hope you have a great rest of your day and congrats on all the progress to really chat and [applause] yeah, congrats on

guys. Thank you for having me back. It's very much appreciated. See you soon.

Anytime. We'll see you soon.

All right, guys.

Goodbye. vibe.co where DTOC brands, B2B startups, and AI companies advertise on streaming TV, pick channels, target audiences, and measure sales just like on Meta.

Well said, John. And I'm also going to tell you about Console. You see it here on the sticker. Console builds AI agents that automates 70% of IT, HR, and finance support, giving employees instant resolution for access requests and password resets. Um,

we are now entering the uh the our lightning round.

← Back to story