General Intuition raises $320M, betting that a trillion action tokens from video games will crack robotics
Jun 29, 2026 with Pim de Witte
Key Points
- General Intuition raises $320M to build physical AI models trained on a trillion action tokens from video game footage, where human annotators labeled raw controller inputs rather than inferring actions from pixels.
- The company's bet is that robots already ship with game controller interfaces, so a model trained across thousands of gaming environments transfers directly to robotics without solving motor torque mapping from scratch.
- De Witte argues General Intuition's structural advantage is unreplicable: competitors lack both the gameplay footage infrastructure and human annotation pipeline required to generate ground-truth action tokens at scale.
Summary
Read full transcript →General Intuition has raised $320 million to build what Pim de Witte describes as a fundamentally different kind of AI model — one trained not on text, but on ground-truth action tokens captured from video game gameplay.
The dataset
The core bet is a dataset of roughly a trillion action tokens, collected alongside Medal's gameplay video. Crucially, these aren't actions inferred from pixels. Thousands of humans manually converted raw controller inputs into labeled actions — walk forward, open door, close door — giving the model direct ground truth rather than reconstructed behavior. De Witte argues this distinction matters enormously at deployment: when a pilot adjusts a rudder, that movement never appears in the pixels, but it does appear in the action sequence. Models trained on inferred data can benchmark well on general tasks but fail on edge cases where precise action sequences are required.
For scale context, frontier LLMs train on roughly five to ten trillion text tokens. General Intuition's trillion action tokens puts it at a comparable order of magnitude within its target capability domain.
“The only reason why we have a shot is because we have a dataset that nobody else has, which allows us to be as focused on workloads that include space and time as Anthropic was on their code environments on the way to the frontier. We have roughly a trillion action tokens in that space. Frontier LMs are trained on maybe between five and ten trillion text tokens. The most obvious thing this replaces is all the code that people are currently writing for behavior in physics engines — all that just becomes a prompt.”
The robotics thesis
The product logic rests on a specific observation: most robots already ship with game controller or keyboard-and-mouse interfaces. If the model can learn to predict actions in that generalized controller space across thousands of video game environments — 2D, 3D, long-horizon, short-horizon — it transfers directly to robots without needing to solve the harder problem of mapping model outputs to motor torques from scratch. The robot's existing firmware handles that bridge.
De Witte's most pointed prediction is that the robot supply chain will converge on gaming inputs as the standard control interface within the next two years. If intelligence is the bottleneck, the argument goes, then any system already controllable via game controller becomes a target, and all the behavior logic currently written as code in physics engines becomes a prompt.
The competitive claim
De Witte positions this explicitly against the broader neo-lab field. General models entering the language model space lack a structural advantage; General Intuition's claimed edge is that no other lab can replicate this dataset approach. The ground-truth action token pipeline requires both the gameplay footage infrastructure (via Medal) and the human annotation layer that converts raw inputs into labeled actions. Without that, competitors are left training on inferred data, which de Witte argues cannot reliably generalize to precise, edge-case action sequences in real deployments.
The $320M raise is the fuel behind that thesis. Whether a trillion action tokens from gaming environments proves sufficient to generalize across industrial and consumer robotics at scale is the open question.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.