AWS CEO Matt Garman on frontier agents, Nova 2, Nova Forge, and Trainium 3 at re:Invent 2025
Dec 2, 2025 with Matt Garman
Key Points
- AWS exposes pre-training checkpoints through Nova Forge, letting enterprises inject proprietary data mid-training rather than fine-tuning finished models, a capability Garman claims no competitor has offered.
- Kiro, AWS's agent-based coding environment, shifts developers from writing code to orchestrating multiple autonomous agents on long-running tasks, addressing the breakdown of unstructured AI assistance at enterprise scale.
- AWS added 3.8 gigawatts of data center capacity in the past year, yet AI compute demand still exceeds supply across chips, networking, power, and physical infrastructure simultaneously.
Summary
AWS CEO Matt Garman used re:Invent 2025 to announce four interlocking bets: frontier agents, Nova 2, Nova Forge, and Trainium 3 — each aimed at deepening enterprise lock-in across the full AI stack.
Frontier agents and Kiro
Garman's clearest product argument is that vibe coding breaks down at enterprise scale. Unstructured AI-assisted development gets engineers stuck, and recovering lost ground can cost as much time as writing the code manually from the start. Kiro, AWS's coding environment, is built around a "specs" model where agents operate within defined requirements rather than free-form prompts. Teams can dispatch 10, 20, or 50 agents simultaneously on amorphous, long-running tasks and then coordinate the results rather than micromanage each step. Garman says agents on Kiro have already run autonomously for multiple hours on complex tasks. The shift he's describing is less a product feature than a workflow change — software developers becoming orchestrators rather than writers.
Nova Forge
The more technically distinctive announcement is Nova Forge. Standard enterprise AI customization means taking a finished pre-trained model and applying fine-tuning or reinforcement learning afterward. Garman argues that approach has a hard ceiling: too much post-training causes models to forget early reasoning and lose core intelligence, a problem he describes as still unsolved in that paradigm.
Nova Forge sidesteps it by exposing pre-training checkpoints — at 60% or 80% completion — so enterprises can inject their own data directly into the training process, mixed with Amazon's data via API, before training finishes. The output is a base model that intrinsically understands the company's corpus, on top of which fine-tuning and RL can then be layered. Garman claims this is the first time pre-training checkpoints have been exposed to enterprise customers in this way.
On accessibility, he is measured. The near-term user is an ML-leaning software engineer, not a non-technical operator, and some enterprises will need consulting support. The goal is to get there without requiring full frontier model pre-training expertise.
Trainium 3 and the open ecosystem
Trainium 3 is positioned as the cost-performance engine underneath the whole stack — chips optimized for the models AWS builds, running the agents it sells. Garman frames the full-stack optimization as a deliberate strategic choice rather than a hedge. AWS will continue to offer Nvidia GPUs, but the internal direction is clearly toward Trainium.
On software ecosystem openness, AWS's Neuron SDK is already open source and supports PyTorch and other standard frameworks. Garman draws an implicit contrast with Google's TPU ecosystem, where open-source availability remains more ambiguous.
Supply constraints
AWS added 3.8 gigawatts of data center capacity in the past year. Garman says demand for AI compute still exceeds supply, with chips, networking, power, and physical data center space all acting as simultaneous constraints. The pace of capacity addition suggests the bottleneck is not investment appetite but the physical and logistical limits of scaling infrastructure this quickly.