Prime Intellect launches INTELLECT-2, a 32B reasoning model trained across the globe for ~$100K

May 13, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

to the stream, Vincent. Love to how you doing? There he is. Perfect. What's going on? Uh we a couple months ago. Yeah. Yeah. Thanks for joining. A couple months ago, we had Bridget from Founders Fund on uh gave us the highlevel uh overview, but I'd love to get it from you.

Can you just introduce yourself and the company to kind of kick us off? Yes. So, my name is Vincent and like I'm the founder uh like co-founder and CEO of Prime Direct and basically our goal really is to commoditize um AI and compute more broadly.

um really also by almost like networking all the computing the world together and making sure like no GPU goes idle but rather can contribute to almost like an interplanetary like compute cluster that can like fund uh for example open source AI models but also that people can just like use to ultimately save on like uh compute like ultimately the goal really is um realizing that like comput is this like gigantic market which will ultimately power all of those different areas and It's quite inefficient and very distributed.

So, a lot of it is ultimately like research challenges to make sure we can like network compute that is like quite distributed throughout the world, but also like verify it, right?

It's like you you can't trust like the thousands or like millions of different like data centers and like the GPU nodes um distributed across the world. So, there there's basically like a challenge to scale trust, but it's like you can trust one data center or one company, but it's hard to do so at scale.

Um, so basically we're attacking those two problems. Okay. Uh, why don't you talk about the launch Sunday? Yeah. You guys, uh, I hadn't seen I hadn't seen a launch like that on a Sunday since the last seemingly few Trump executive orders. So, you guys kind of realized Sunday Yeah. Super Bowl Sunday.

Why not launch a new model? But, but break down uh the the launch uh just a couple days ago. Yes.

So basically um we like kind of like to take a step back almost like did this first uh very large scale distributed pre-training run intellect one which was building on like deep minds distributed low communication training to train basically a model across the internet across multiple data centers in similar efficiency as in a centralized setting where we um where all the computers network in one data center and now that we switched to this kind of like 01 like post-training like reinforcement learning paradigm.

Um it's actually kind of like a paradigm that is like perfectly set up for distributed training because most of it is like those rollouts and basically um you can distribute them extremely well like there's basically less needs to communicate every training step.

Um so what we did with like intellect 2 was really scaling to like the larger model size like a 30 uh roughly 30B model that we basically trained distributed across the globe with different people contributing compute and um and we basically built this on top of quen.

So basically the goal was also to like take the best open models out there and improve them further for like actually quite um like small amounts of compute. It was honestly like more like of a proof of concept where we basically proved four different pieces.

Like on the one side basically um we proved this like verifiable compute piece that basically anyone in the world could join with their compute. We don't need to trust them. There's very like efficient basically ways to like um make sure that they're basically like an honest computer contributor.

And then the other piece was really um making like basically distributed reinforcement learning work. So basically decoupling those like roll generations um from the training itself. So you basically can train across the globe with like less frequency um of communication.

And what we basically proven is that like with quite a um small amount of compute like roughly $100,000 which is not comparable to obviously kind of like the more like close AGI labs we've basically been able to like improve upon quen and it's kind of like sets the foundation for us to actually scale this much further and ultimately actually have kind of like a repeatable recipe to get like basically improve like the strongest open source models out there continuously for quite like small amounts of capital across different domains.

So we made it much better on math and coding but the goal is also to make it much better on like other domains and really like how to do so is like really scaling those like reinforcement learning environments. I think you had Will Brown on like he joined us to work on this.

Yeah, I was going to ask what was the what was the process like recruiting uh the big dog WB. Absolutely dog. I mean I I just have to imagine he's such a good poster. He's such a a great guy to chat with. I I He could have gone anywhere but he ended up with you guys. Yeah.

I think it's it's two things like you should ask himself but I think like a lot of talent is gravitating more towards like open source AGI and like to basically build in the open to publish in the open right like to be able to also um like contribute to the frontier of like the most exciting research and in a sense like I have friends at Mayai and who's you've never heard because they can't publish can't really talk about their work right so like I think there's there's this component but I think then we are basically I think like now like especially in the western hemisphere I think like one of the only places that is really catching up to the frontier right like there's obviously like Quen and Deepseek and and others in China but I think it's really a fumble by like America I think to basically be the close guy that is like not really almost like standing in for the values of like democracy and freedom to the same extent but I think like he he's obviously extremely passionate about like um open source AGI and kind of like distributed like reinforcement learning so like he immediately hit the ground running on on basically scaling this much further So we'll have more like yeah like work we are building on top of like this direct jump distributed reinforcement learning like in the next few months already out which should like scale this much further.

I remember a couple years ago I was using um uh Octane render and they have a distributed rendering network where you can render CGI images on GPUs all over the world and they actually have a a crypto layer uh for render tokens as well.

But this idea of distributed computation makes a ton of sense in that context because each frame is unrelated to the other frame. So you can if you're rendering a 90 frame sequence that's a few seconds long, you can just send you know each frame to a different machine.

Uh what have you learned from the successes and failures of the render network and what Otoy did with Octane?

Um and and is there any difference between those discrete individual frame generation and something as complex as training a whole model where you know you might you in theory you'd need to load it all into one uh one instance every single time. Yeah, like great question.

I think there's a few players like them and others that have tried us in the past and I think they've been almost like in some ways I think too narrow and like ambitious, right?

like rendering like is nice but like it's not like the general purpose like yeah like use case that we now have with AI right so I think there's basically like our goal is really making this much more like general purpose useful for like every single like comput and AI use case and I think that has been one big learnings basically being like most of our team are AI researchers and they come from the AI world including the people that build for example the verification mechanisms right like the verification in our case has like tiny overhead like 1% like the more like crypton native verification mechanisms have more like a 1 to 10x overhead which then make it completely like unfeasible to compete with centralized clusters.

So I think that has been like a big learning is basically being very pragmatic and close to the um AI researchers. So that's like the main community we're like building with and engaging with and basically building for right.

So like all of our users are AI developers and ultimately I think the biggest lesson has been is like just creating value for for them from day one. Right.

is like we launched basically our compute platform like two or three months in with like two or three people and and immediately got like a lot of users and um scaled since then like basically that like any kind of like AI startup or developer can just like basically with our platform almost like like have like a global market on the one side right like we we aggregate all the hyperscalers all the different data centers even working with like folks like Aaron like later on from um Hydra like and all these guys to basically aggregate and orchestrate like all these different data centers and hyperscalers and I think that has been another lesson like we are not trying to be like the exclusive maybe marketplace where supply onboards we just aggregated all the supply out there which gave us a huge kind of like wedge maybe to like enter the market.

Okay, talk about some of the hardware where this is actually running a 30 billion parameter model. Uh at at floatingoint 16, you're going to need 60 gigs of RAM RAM. You're probably going to be on H100, not gaming cards.

But uh are you thinking about bringing uh residual cap capacity online in the gaming market or are these like small data center operators that just have a stack of H100s that they're not using and then they're bringing them online? Yes.

So basically I think what's been exciting is like actually this run was hit genius so H100s and A100s could join and I think maybe even like H100s um and like for future ones like actually for next one we're starting like even 3090s and and like gamer cards can join and I think what's interesting is you can basically with like uh this ro generation and like basically synthetic data generation you can distribute the tasks like basically based on their like difficulty to different cards right and to different models.

So the goal really is like to have like a fall tolerant like hiogen here is like um pool of compute right so like what's also a big one that we did already in the first run but also in this one is is full tolerance so it's not only that you have like 8100's joining and the A100s and 3090s but they can also drop out right like they can join for a few hours when they idle and I think that's like the border thesis is like almost like every GPU that is like idling is like a market failure right like especially in a world where increasingly like my theory is like the the biggest almost like commodity in the world will be compute and like it will be one of like one of the biggest slices of GDP and it's like unfortunate if like half of that is like sitting idle instead of like curing diseases generating like movies or like some like creating basically technological progress right and I think the goal is to basically just get to like 100% utilization of compute and we're very very far from that.

Yeah. Yeah. Well, uh what what are you seeing in like the trend in chip size and scale? You know, obviously Nvidia's pushing pushing those like DJX uh DGX like rack mounted units, racking everything together with NVLink so you can do even bigger models. Sarah Brris is scaling up the single chip size to wafer scale.

Uh the the the theme of the last few years seems to have been uh let's not decentralize, let's actually centralize further. And so uh where does this go? What is your take on on kind of the ever bigger racks and ever bigger chips for larger compute training?

Yeah, I think we'll see both and I think actually ironically like an Nvidia actually has an incentive to fragment the market like they don't want one buyer.

They don't want AWS to be the only customer because AWS is building their own chips or like like the big tech clouds basically uh like have their own chips competing with Nvidia right so Nvidia is very happy to give a bunch of chips to core a bunch of chips to cruiser a bunch of chips to lambda and all of these other guys right but then um so I think we'll see both at the same time like I think like all like we'll have thousands and like millions probably over the next few years like of data centers there will be these gigantic clusters right like Stargate from OpenAI but I Think at the same time like there's increasing amount of like capacity in in the almost like the mid tier of like smaller data centers.

Um but then I think like also on the other side like there's a lot of comput like sitting idle everywhere right like soon whatever like autonomous cars, robots like everything else.

So we actually had like some of the biggest like car companies in the world reach out to us or phone manufacturers that they're like oh we're sitting on like hundreds of millions like billions of like phones or cars that are like have idle compute capacity.

And I think we're not there yet, but I think we'll we'll have like we'll have just much more compute everywhere, right? Like we'll have more like mega clusters, but we'll also have like more mid-tier and small clusters. Mega clusters. Mega clusters. Mega clusters. Uh I have a question before you leave.

Can you can you give some uh any expectations around DeepSeek R2? I've seen some crazy rumors flying around online. Uh, I don't know what's what's real and what's fake news, information warfare. Curious what you're expecting out of the next release. Yes.

Like I think they'll they'll definitely catch up to the frontier like again I think to some extent I think they'll they'll probably be not as good would be my guess as like um Gemini or OpenAI and but I think there there's obviously this trend where I think like and and all the closed lab cos have admitted it right.

like that open source has catched up like very uncomfortably close to their lead and I think that will continue. I think the the the closed guys obviously have some um advantages, but I think there's also huge advantages obviously the open source AGI space has.

So I think like the trend will continue that ultimately like the Deep Seeks and the Quans of the World, but hopefully also like players like Llama will like catch up a bit more like obviously they've been a bit disappointing with their like Llama 4 launch, but I think they might also actually uh like catch up again.

And and then obviously I think like our goal is like to be able to basically build on all of these um on all of that progress, right?

It's like basically we kicked off our like largest synthetic data generation run the day deepseek dropped and like generated basically a lot of like verifiable like reasoning data that we could then use for example in this recent model.

So I think the the like what I what I see more is like basically we can like improve open source AI progress like literally day by day week by week like it won't be a thing where we need to wait like four months for like some progress to come but it's more like continuous basically progress.

Well, thank you so much for joining. This was a fantastic conversation. Yeah, really enjoyed this. Uh, and congrats on scooping. Yeah. WB, man. Legend. Truly, truly. Congratulations. But, uh, come back on again soon.

I know you guys are going to have a bunch of launches uh, coming up and, uh, appreciated your perspective. We'll talk to you soon. Appreciate. Uh, and if you're looking for distributed compute, go to Prime Intellect. If you're looking for distributed advertising, go to adquake. com.

Out of home advertising made easy and measurable. Say goodbye to the headaches of out of home advertising. Only adqu combines technology, out of home expertise and data to enable efficient, seamless ad buying across the globe. They just make it too easy, John. I want to spend every single dollar on Adqu.

Anyway, we got the CEO of Backbone in the studio. Let's bring him in. Man, uh, excited to have you here. Welcome to

← Back to story