Prime Intellect's Intellect-3 proves a 100B-parameter open RL model can match closed-source giants — and every app will need to do this
Dec 1, 2025 with Vincent Weisser
Key Points
- Prime Intellect's Intellect-3, a 100-billion-parameter model trained with reinforcement learning, matches much larger closed models like DeepSeek by optimizing post-training rather than raw scale.
- Post-training via RL is becoming the primary capability lever across the industry, and companies can build domain-specific models for hundreds of thousands of dollars that outperform frontier models on specific tasks.
- Application companies with end-user distribution like Cursor and Figma will build RL environments around their own interfaces to train specialized models, giving them structural advantages over pure model providers.
Summary
Prime Intellect's release of Intellect-3 demonstrates that a 100 billion parameter mixture-of-experts model can match open-source models in the 300–600 billion parameter range, including DeepSeek, through aggressive post-training via reinforcement learning rather than raw scale. The model was built by taking GLM as a base and running a full SFT and RL pipeline, developed over roughly six months. The release is framed as an early checkpoint, with further scaling planned.
The broader argument from Vincent at Prime Intellect is that RL post-training is becoming the primary capability lever across the industry, visible in Claude Opus, GPT-4.5, and Gemini, and that the capital requirement to do this is far lower than frontier labs imply. A company can post-train a domain-specific model for hundreds of thousands of dollars, producing a smaller, faster, cheaper model that outperforms general-purpose frontier models on a specific task.
Cursor's Composer is cited as the clearest commercial precedent. Cursor reportedly took an open-source model, built its own RL environment within the Composer harness, and trained it to excel specifically at Cursor tasks. That model is now updated every two hours using an online RL loop driven by real user interactions, with each accept or reject signal feeding back into training continuously.
Prime Intellect's thesis is that this pattern will generalize across every application layer. A platform like Figma that wants to make itself agentic cannot rely on closed frontier models to navigate its own interface well. It needs to build an RL environment around Figma itself and post-train on that. The companies with distribution advantages, specifically those that own the end-user interaction such as Cursor, Cognition, and potentially Microsoft through Copilot across Excel and PowerPoint, are structurally advantaged over pure model providers.
For enterprises without in-house ML capacity, Prime Intellect is building an RFT platform designed to make post-training plug-and-play, alongside a forward-deployed services model. The longer-term roadmap points toward autonomous AI research agents that handle model fine-tuning and post-training without human ML expertise, comparable to how no-code tools democratized software creation.
On the base model side, Prime Intellect disclosed it supported RCI in releasing a small MoE base model, with 2,000 H100s now ramping toward a significantly larger pretrain run. The strategic rationale is a perceived gap in the market: with Meta's LLaMA team having undergone restructuring and Mistral pivoting toward European enterprise deployment, there are very few Western players offering a full-stack pipeline from data collection through pretraining, fine-tuning, and post-training. Sovereign governments and large enterprises unwilling to depend on Chinese open models or closed US models represent the core addressable segment Prime Intellect is targeting.