Google, Microsoft, and xAI join pre-release AI model review agreements with the US government
Key Points
- Google, Microsoft, and xAI have joined OpenAI and Anthropic in signing pre-release model review agreements with the U.S. Commerce Department, expanding government oversight of frontier AI releases beyond the two original signatories.
- Regulatory approval delays could entrench incumbents by allowing labs to train multiple model generations internally while awaiting public release clearance, while startups lack compute reserves to do the same.
- Chinese open-source models currently lag U.S. labs by eight months, but if American releases get stuck in approval limbo while China ships continuously, the public capability gap could narrow despite no acceleration in Chinese development.
Summary
Government Pre-Release AI Review Program Expands to Five Major Labs
Google, Microsoft, and xAI have joined OpenAI and Anthropic in signing pre-release model review agreements with the U.S. Commerce Department's Center for AI Standards and Innovation (Casey). The center, previously called the AI Safety Institute under Biden, conducts evaluations before companies release frontier models to the public.
The program itself is not new—OpenAI and Anthropic signed identical agreements two years ago in 2024—but the expansion to include three additional major labs signals a broader push toward government oversight of frontier AI releases. The Commerce Department claims Casey has completed more than 40 evaluations to date, including on models that have not yet been released publicly.
The Scale Problem
One notable absence: Meta, which is investing $125 billion in AI compute capacity—likely a larger capital commitment than xAI's total compute infrastructure. The hosts note that with five of the top six labs participating, Meta faces pressure to join the framework; regulatory alignment appears to carry low risk once consensus forms among competitors.
How Regulatory Bottlenecks Could Reshape Competition
The mechanics of approval create potential timing distortions. When a lab finishes training a frontier model, it submits to Casey for review. The government's incentives run counter to speed: approving a powerful model creates liability if something goes wrong, while delays keep capabilities exclusive to the government. Meanwhile, labs can continue training subsequent versions internally while waiting for public release approvals.
If OpenAI finishes GPT-6 and submits it while the government reviews, the company can begin training GPT-6.1 with the unreleased model. By the time GPT-6 clears review, the lab may have trained multiple generations ahead internally—creating a situation where labs hold vast capability leads while the public waits months for releases already submitted.
The consequence is counterintuitive: labs unable to release models may bloat internally, maximizing their own use of unreleased capabilities rather than selling them. This structure could entrench incumbents while raising barriers for startups, which lack the compute reserves to train multiple model generations speculatively during regulatory delays.
The Open Source Question
George Hotz of Tiny Corp argues the framework advantages closed competitors and, indirectly, Chinese open-source labs. Chinese open-source models are currently eight months behind U.S. labs, but if American labs face release delays while Chinese teams ship models continuously, the public gap could narrow—not because Chinese capabilities accelerated, but because U.S. capabilities got stuck in approval limbo.
Hotz advocates for fully open-sourced frontier models with no release restrictions. The regulatory framework moves in the opposite direction, creating the scenario he warns against: authoritarian control of AI access.
What We Don't Know Yet
The details matter enormously. An executive order expected May 5 will specify what types of models trigger review, whether startups face the same approval burden as hyperscalers, and whether licensing requirements could restrict local model downloads or on-device deployment. The framework could amount to standardized benchmarking (checking how models handle the "strawberry" counting task, as one host jokes) or create genuine approval friction.
Regulatory capture poses a second risk. If small labs must establish DC offices and hire lobbyists to navigate approval, funding flows toward incumbents with compliance infrastructure. Model approval timelines could track political alignment rather than safety thresholds.
The hosts emphasize waiting for the executive order before taking hard positions. The difference between standardization and bottleneck depends entirely on implementation.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.