General Intuition raises $320M, betting that a trillion action tokens from video games will crack robotics

Jun 29, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Pim de Witte

Very cool name. I wish you know what I'm thinking, John.

General,

I wish that uh after computing, I wish that Chad launched Chad computing.

Oh, yeah.

Uh it was right there. It was right there. Anyway, but

we have the co-founder and CEO of General Intuition with us. Welcome to the show, Pam. How are you doing?

What's happening?

Hey guys,

thanks so much for coming back on the show.

Uh uh

yeah, please.

We're We've been talking about names for labs. What about uh consider general intuition strong name

but since since you launched the company a lot of other a lot of other Neil labs have kind of come out with like

similar names like general there's probably like a general super intelligence or like a general ASI how about you you rebrand to unfettered

intelligence

that might be it

or how about we just fund them all

yeah that that too

yeah what what what is the plan to when do do you see yourself as a Neolab and do you see uh is it as much of a of a knockout dragout fight as it appears from the outside or is your model more of a uh a thousand flowers bloom?

The plan is to just keep renaming

the uh No, the look you have to have a claim to why you can win I think otherwise none of this makes any sense. Um, it's an incredibly competitive fight. There's lots of great contenders. The only reason why we have a shot is because we have a data set that nobody else has, which allows us to be as focused on workloads that include space and time as entropic on their code environments on the way to the frontier.

And so you need to have a very focused dedicated path. Some of that can be um uh for instance having the best researchers or having having the the new ideas but I think it also has to be supplemented with a product focus of some a customer problem that is going to get solved because these types of model classes exist. Network effects just like we saw in the consumer eras of the Facebooks and the Twitters and the Reddits. These things are true. They apply to LMS as well. the fight for that space is going to be incredibly tough and so you have to introduce something new. I don't believe in the just M30LM space. Um, which is why we we're focused on on actions and space and time.

What's the

Okay, actions in space and time. Let's talk about the data set. Uh, catch everyone up to speed on uh I mean, you know, you you broke it down for us the last time you're on, but it feels like it's been almost a year at this point. So, what have you been working on? and talk about the data set, how you're how you're building the data set, all that stuff.

Yeah. Look at this way. As humans, the decision to talk or type is just a very very small subset of the actions that we can actually take, right? We can choose to move our body. Um, and so in order to create a sufficiently general intelligence to play 10,000 plus video games, the model has to be able to predict across the entire action space of human cognition when they're interacting uh with these environments, which is 2D environments, 3D environments, interfaces, um, long horizon tasks, short horizon tasks. Um, and so in order to do that, it has to be a sufficiently general intelligence in order to uh, learn how to correctly predict actions. Um, and therefore the type of model you get out is not going to taste like an LM. It's going to be like comparing coffee to water. This model is going to be incredibly good at navigating unforeseen environments. It's going to be incredibly good at um a zeroing any task where it can already be controlled using a game controller because we have roughly a trillion action tokens uh in that space for example, right? For context, Frontier LM are trained on maybe between five and 10 trillion text tokens, right? And so we have a scale of data that is going to allow us to jump to the frontier in one capability, which is any system that can be controlled using game controller,

which is most robots, right? That's really what we're doing. We're using that simplification to turn it into mostly an environment transfer problem. And then you can use that to create a sufficiently general intelligence where you maybe at some point add text to the output space. Right? It's not going to be text as you're used to from LMS, but it might just be enough to communicate why you're doing a specific thing. So

that's how to view the models.

So yeah, walk through the partnership with metal. Are you getting game controller feedback as well when those Yeah, explain explain the relationship to metal for those. So alongside the frames in the video, um we're also getting the exact action inputs. To be clear, not the letters or numbers, right? We can we have we had thousands of humans convert those into the actions you're taking. So walk forward, walk left,

um uh open door, close door. Um, and so when you have that at that ground truth level,

you don't need to train models that try to extract that information from the videos, which you are now in a completely different uh scaling regime as if you were trying to do this on inferred data. So for example, if you're landing a plane and you're moving the rudder, that's not going to be visible in the pixels. It's impossible for that to be visible in the pixels, right? But it's in the action sequence. And so there's just no lab that can take this approach. There's lots of benchmarks that might show that you can do this on inferred data. The problem with inferred data and these benchmarks is that they show up in a really nice way on general tasks, but customers care about how these models perform when you're in an edge case and you need specific actions to go in specific ways. And so um you cannot do this on on inferred data uh despite uh many people claiming you can. Tell us about the latest round. I want to hit the gong. What happened?

Yeah.

How much did you raise?

What happened? We raised $320 million.

That

Congratulations and thank you so much for taking the time to come chat with us.

One more final question. Uh what is the talk about uh progress from your customers, companies that you're talking to in in robotics? Where is maybe an area that that you're particularly excited about that you don't see being talked about yet?

Yeah, the the most obvious thing this replaces is um all the code that people are currently writing for behavior in physics engines. All that just becomes a prompt. Um and so uh think of the models as based on an input stream of just frames being able to control whichever system is sending those frames in the action space of a game controller or keyboard and mouse. But basically, you can play the world as if it was a video game. If that can be said about your use case, the models will generally do incredibly well.

Um, the reason why this works is because every robot already ships with these, which means that they can simply predict at the level of these controllers and therefore the robot has already accounted for sort of human monkey brain to motor torque prediction interface and merging that with the actual things coming from the controller. Right? So, we're using the fact that those interfaces exist as a level of predicting in a general action space that works across many types of robots. In many ways, you could argue that um if this is correct at scale, the supply chain will converge on gaming inputs instead of humanoid robots. And I think that is one of the big things that I foresee happening in the next two years.

Um because intelligence is a bottleneck.

Yeah.

Well, thank you so much for taking the time to come chat with us. Congratulations and we'll talk to you soon. Have a good one. Let me tell you about Cisco

critical infrastructure for the AI era. Unlock seamless real-time experiences and new value with Cisco. Uh fascinating. It's also funny seeing all those simulators on Steam like and and the fact that like will the training data generalize? Is are they just going