ARC Prize's Mike Knoop: AI is idea-constrained, not compute-constrained — we need new breakthroughs
Jun 18, 2025 with Mike Knoop
Key Points
- AI progress is now idea-constrained rather than compute-constrained, with reasoning models showing spiky domain-specific gains in math and coding but weaker transfer to legal work.
- Reasoning models lack product-market fit despite lab enthusiasm because wait times and complexity create friction that most users won't tolerate.
- Demand is shifting from human-labeled text to reinforcement learning environments that generate synthetic chain-of-thought traces, spawning startups like Mechanized AI, Morph, and Habitat.
Summary
Mike Knoop, co-founder of ARC Prize, argues that AI progress is no longer compute-constrained. The field is idea-constrained, and the next breakthroughs require new approaches, not bigger runs.
The Pareto frontier problem
Over the past six to nine months, every major lab has shifted from scaling pretraining on labeled text toward test-time compute and reasoning models that think out loud before answering. But there is no single winner. The labs have landed on different cost-accuracy tradeoffs. Anyone quoting a single benchmark number is marketing to you, Knoop says. O3 High leads on raw accuracy if cost and latency don't matter. Gemini 2.5 Pro Thinking and Claude trade some horsepower for speed and price. The right choice depends entirely on the product context.
Reasoning model adoption presents a counterintuitive problem. Despite excitement around DeepSeek's open-access reasoning chain, these systems may have weaker product-market fit than standard language models in their current form. Wait times and complexity create friction that most users won't absorb.
Spiky intelligence and domain specialization
The original O3 paper showed a striking pattern. Its reasoning gains in math and coding were dramatically higher than in legal reasoning, even though legal work involves the kind of symbolic, self-consistent logic that should transfer cleanly. Knoop reads this as early evidence that reasoning model improvements are domain-specific rather than general. He expects benchmark scores across labs to diverge meaningfully over the next 12 to 24 months as each lab optimizes its synthetic training environments for different domains.
The RL environment wave
The training paradigm shift has direct commercial consequences. Demand for human-labeled text is declining. Labs now want reinforcement learning environments that generate synthetic chain-of-thought traces autonomously, at scale, across long-running tasks. Knoop names several startups founded in recent months specifically to build and sell these environments to frontier labs: Mechanized AI, Morph, and Habitat. The comparison to Scale AI's earlier rise is direct. Scale scaled on autonomous vehicle labeling, then pivoted as that demand peaked. Knoop expects founder-led labeling companies to recognize the RL environment shift and place bets there if they haven't already.