MaddX raises $500M to ship LLM-optimized chips with SRAM+HBM hybrid architecture targeting frontier labs

Feb 24, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

Good to meet you, Run.

Let me tell you about Crowdstrike. Crowd, your business is AI. their businesses securing it. Crowd Strike secures AI and stops breaches. And without further ado, we have Rainineer Pope from Matt X in the ream waiting room. Welcome to the show. How are you doing?

What's going on?

Doing great. Very happy to be here.

Thanks so much for hopping on. Uh it's your first appearance. We'd love an introduction on yourself and the company to kick it off.

Yeah. So happy to be here. I I'm Reiner. I'm CEO and one of the founders of of MaddX. Um we we are a company that makes uh the best chips physically possible for large language models.

Okay.

So uh we've been doing this for about 2 or 3 years. Before that I was myself I was at um at Google for about a decade working on large language models. Uh worked on the TPUs for a bit worked on some other hardware projects. Um and really as part of that what we saw was that um there was this like if you really want to make the best chips for LLMs and LLMs were this big up and cominging workload back in 22. Um if you want to make the best chips for LM you need to do really the best way to do it is from a from scratch blank slate design. So designed for large matrices very low precision um very low latency. And so uh my co-founder Mike Gunter and I uh at that point in in 22 decided to leave Google to start MaddX um where we're doing exactly that.

The day we're announcing please

uh Maddx 1. This is our uh this is a a new chip which simultaneously offers better throughput per square millimeter or throughput of a chip than any other product in the market while at the same time offering um uh lowest latency latency that is comparable to the best which is Grock and Cerebras.

Yeah. What are the various tradeoffs in custom silicon design these days? Is it is it just I mean at the highest levels is flexibility and speed or cost size wafer size like how do you think about the design space and then I want to know how you actually narrowed it on your particular uh decisions.

Yeah. So generally there's some kind of performance per something and so like let's analyze those pieces like the uh two different aspects of performance are what is the throughput and what is the latency. So how many users simultaneously can I support is throughput and then latency is for one user how fast is the experience.

Both of those matter.

Um and then on the like the the per per something like per per dollar how much does the chip actually cost? Um and then per watt which is like what is the power bill of the chip. Um so those are the like all combinations of those uh two numerators and two denominators are the things we care about. Mhm.

Um what we see in the market today is that uh the the number one constraint is just the throughput per dollar and the throughput per watt. So I these frontier labs have so much demand for compute serving all of these like trillions of tokens uh per day. Um and so they uh the cost and the economics is the main constraint. There's only so much uh so many square millimeters of silicon wafer being produced every year. And so uh given that constraint on how much silicon there is, can we maximize the number of tokens and then maximize the intelligence of the models uh coming that wafer?

Uh what does what does the go to market look like? Are you already sold out? Like who's who's uh who who are you kind of targeting early on? How do you scale all that stuff?

So So one of the places where we've seen the most interest in our product is uh from Frontier Labs really. And and so this is coming from a combination of uh the the uh they are the ones who are driving really all of this demand and are so much constrained on cost uh as well as uh silicon wafer supply. Um but then also they are the ones who are who are doing these reinforcement learning training workloads which are very very latency sensitive. They have to roll out um long rollouts uh in a very long loop.

So so that's where we've seen the most interest. Um the one of the things that shows up there is that uh uh when they are looking to make place an order it is an order on the order of gigawatts or something like that which is which is massive volumes. So one of the things that we're actually very excited to be able to do now um with this raise that we've just uh announced is um is is help uh ramp up the supply chain in order to be able to deliver you know gigawatts a year of of volume which is which is a massive volume to be able to deliver.

Yeah. uh when you were initially thinking of starting the company pitching it to investors early on how did you answer the question around Nvidia's various modes or uh kind of strategic advantages you know think CUDA uh all that stuff

yeah I I think it's really interesting like CUDA is for Nvidia simultaneously the biggest strategic advantage and also a constraint because their promise that they make you is that you can take a CUDA program written 10 years ago and it will run on the next generation Nvidia GPU. Jensen goes on stage and promises this. It is so valuable for them and yet at the same time it means the next generation GPU has to look just like the GPU from 10 years ago.

So so it means things like the numeric can't change. The way the cores in the chip are connected to each other can't change. Um the uh the the actual memory architecture can't substantially change. All of these things are kind of locked in by the programming model that they designed uh more than a decade ago for for general purpose uh parallelism. And so so this is where we've seen the biggest differentiation. If they wanted to say, well, we're going to like completely give up our CUDA approach and and start a new generation of chips. Uh maybe they could do that. They would lose all of this lock in that they have, but then at least they would be on a level playing field with us. But but that's not what we see. Uh really we see them being committed to to their um their trajectory. uh this the the CUDA lockin is very valuable for um for sort of the mid and tail of the market where people are so sensitive to the software cost but really at the head of the market in the frontier labs the the software is not the main cost the hardware is the main cost and so if you're willing to rewrite your software maybe you can you can actually switch to a more efficient hardware like us

and it's getting easier to rewrite software

uh as you plan your business how are you thinking about bottlenecks be you know one month it's energy the next month it's chips then you know a lot of concerns around TSMC right now how are you how are you kind of planning

yeah so I mean I think the these bottlenecks are real and are going to stay for a long time uh

what like the the big bottlenecks that you see in the manufacturing supply chain are on logic dies from TSMC and then memory dies from highex Samsung micron um and then and then manufacturing so of racks and so on

uh given the these bottle given these bottleneck exist. What you would like to do as as a consumer of of of such things is you want to get the most bang for for your buck. So the the most performance out of every square millime of silicon. Uh that is what has been our focus. We the the flops per square millimeter the 4bit precision uh multiplies you can do per square millm of silicon is higher in our product than any other product. And so you know as the price of every silicon wafer goes up you can do more with it uh on on our solution than yours. I I assume you're on the most I mean you you've mentioned this you're you're selling to the frontier labs running frontier models uh probably on the most leading edge chips the most leading edge fabrication nodes um is there a world where it's valuable to say hey we have some lagging edge capacity out there what if we go design custom silicon that runs on the last generation Intel node that's not the line out the door for capacity and then I'm not competing with you. Does that not work? Is that not possible or is that just a completely orthogonal business to what you're building? So

that that approach is possible um it it's it is maybe more of an approach for a um a player with rich pockets rather than a startup.

Um in in that like every different process you target it costs you another 2030 $40 million of development cost. Yeah. And so if you're going to bet all of your eggs on like put all of your eggs in one basket, you should put it in in the leading edge node

in the best basket.

Yeah,

that makes sense. Uh talk to me about other tradeoffs at TSMC. I mean uh Cerebrus is famously wafer scale. Um how what is the trade-off on like size of die these days?

Yeah. So I mean there there's a trade-off of size of die and then also memory architecture. So size of die um Cerebrus is the outlier. Um almost everyone else has converged on reticle scale which is the largest sort of standardly produced um TSMT uh chip. Um we're in that same category of like we're we're about this we're in the standard uh bucket there. It that avoids a lot of the physical risks that uh you know when you look at cerebras they've had to spend all this time on dealing with just like bending and and all these uncomfortable physical constraints that we don't want to deal with. Yeah. So the reticle size chips but then the other bigger thing is which memory technology do you use? Uh the historically there's been like the HBM based players that's Google, Amazon, Nvidia and then there's been the SRAM based players which are Cerebras and Grock.

Mhm.

Uh SRAMM is small but very very fast

and so uh the very very fast is good if you want to run low latency. You can put your model weights in SRAMM and and you get the best latency in the market. That's what Grock and Cerebras have done. Uh but the reason they haven't like sold out in the market is because uh there's not enough space in the SRAM to to store all of your long context uh KV caches.

Sure.

And so one of the things that uh like this is the reason why the HBM based players like Google, Amazon, Nvidia um have won is because of like the HPM is actually essential.

But it's actually possible to marry both of these uh approaches and put them in one chip and and that is what we're doing with MADX1. Um and so uh it curiously I mean it doesn't just give you the best of both worlds. It actually beats any alternative on throughput. Um there's this curious effect where when you have your weights in SRAMM you can actually get better mileage better usage of the HBM um in return. And so there's uh we think this is actually the the way where the market in general will move over time.

Okay. uh help me understand uh the the the trade-off continuum of around flexibility of of model. I mean imagine uh on an Nvidia NVL72 I can sort of run any model as long as it fits and works and is trained properly. Uh and then Talis is like these specific weights on the chip. You can never change them uh whatsoever. Uh and then there's something in the middle. How much flexibility do you think is important? How much flexibility are you planning around and how and how do you think about sightelines? Because I imagine that the delay between like the final architectural design to chips in data centers is still a year, 18 months, something like that.

Yeah. I mean that there's all of these manufacturing and then deployment uh times that that make it take a long time. In general, I would say that from from sort of pencils down on on chips to like when is the last time you're using it, the chip is going to be in the data center itself for like three to five years and then there's maybe as you say a year a year and a half of of deployment time in advance of that. So you want your chip to be relevant for for a 5y year time span maybe.

the uh so you need to point pick a point of specialization which you think is here to stay. For us that is very large matrices. Uh and then in fact really large matrices together with a splitable systolic array which is a um piece of technology. Uh but very large matrices is the the the theme that we started with and this is just a recognition of over time models have been growing. They grew a ton with LLMs and they're continuing to grow. Um and if you specialize for that you can get big efficiency on the matrices themselves.

Uh now we are still very general purpose programmable in terms of the vector unit. Um, similar to Nvidia, we have this vector unit that you can run any instruction on, like add, multiply, subtract, divide, all of those things. Um, and so that gives you the like it's it's trying it's trying to put a good amount of flexibility, but in a in a way that only costs like 5 10% of the the cost of the chip overall.

Okay.

Funding news.

Give us the news. How much did you

What happened?

So, we're happy. We we we have raised $500 million. Uh, this this was round.

Yes. Boom.

Who'd you raise it from?

So we this was led by um main street and situational awareness. Uh so situational awareness that's Leopold and Brennes fund. He

if you've been living under a data center that is that is uh that's his

exactly you might have heard of him. Um uh so he really sees just like the big picture of where this this space is going and and he recognizes like just how much demand there is for silicon. And then on the other end of the spectrum, Jane Street, they are expert technologists. They know everything about what exactly is required to build a product like this. And they they know what good is in a product like this.

Mhm.

So we're we're really happy to have these like like strong experts here. Um this sort of mirrors what we see inside the company as well. We have a wide range of experiences across hardware, software and ML. And then even in in the rest of the investors who are participating in our round, we have like uh uh renewed uh participation from from our previous investors. This is um Spark Capital uh and NFTG um as well as uh a range of folks such as the Patrick and John Collison. um uh experienced ML people like Andre Kapathy um

and then even participation from the the supply chain like Marvel val

Did you let any normies in? They're just the most elite people in the world.

Yeah, we like

just one mouth breather, please.

Congratul ever. It's amazing. I'm extremely excited for this and excited for it to

now you have to you have to you have to win on such massive scale otherwise you'll bring dishonor to all the industry legends. So

no thank you no it's it's uh really uh cool to hear your perspective and approach to everything and I'm sure you'll be back on the show this year. So congrats to the team.

Yeah we'd love to have you back. Thank you so much for taking the time. We'll talk to you soon.

Goodbye.

Let me tell you about vaude.co where DTOC brands B2B startups and AI companies advertise on streaming TV. pick channels, target audiences, and measure sales just like on

Have we Have we had a

investor lineup like that before the Jane Street situational awareness co-lead and then just down

fantastic? Well, I mean, situational awareness is a new fund has not led that many rounds.

I know, but I'm just saying you go back,

maybe it turns into a spray and prey

← Back to story