Interview

AWS CEO Matt Garman on frontier agents, Nova 2, Nova Forge, and Trainium 3 at re:Invent 2025

Dec 2, 2025 with Matt Garman

Key Points

AWS exposes pre-training checkpoints through Nova Forge, letting enterprises inject proprietary data mid-training rather than fine-tuning finished models, a capability Garman claims no competitor has offered.
Kiro, AWS's agent-based coding environment, shifts developers from writing code to orchestrating multiple autonomous agents on long-running tasks, addressing the breakdown of unstructured AI assistance at enterprise scale.
AWS added 3.8 gigawatts of data center capacity in the past year, yet AI compute demand still exceeds supply across chips, networking, power, and physical infrastructure simultaneously.

AWS CEO Matt Garman on frontier agents, Nova 2, Nova Forge, and Trainium 3 at re:Invent 2025

Summary

AWS CEO Matt Garman used re:Invent 2025 to announce four interlocking bets: frontier agents, Nova 2, Nova Forge, and Trainium 3 — each aimed at deepening enterprise lock-in across the full AI stack.

Frontier agents and Kiro

Garman's clearest product argument is that vibe coding breaks down at enterprise scale. Unstructured AI-assisted development gets engineers stuck, and recovering lost ground can cost as much time as writing the code manually from the start. Kiro, AWS's coding environment, is built around a "specs" model where agents operate within defined requirements rather than free-form prompts. Teams can dispatch 10, 20, or 50 agents simultaneously on amorphous, long-running tasks and then coordinate the results rather than micromanage each step. Garman says agents on Kiro have already run autonomously for multiple hours on complex tasks. The shift he's describing is less a product feature than a workflow change — software developers becoming orchestrators rather than writers.

“We introduced the idea of frontier agents — agents in Kiro for software development as well as in operations and security. These frontier agents are meant to accomplish much more than customers were ever able to do. We announced Nova Forge, which allows customers to actually bring their own data to pre-training checkpoints, mix in their data with Amazon data, finish training the model, and have a custom model that deeply understands their enterprise data. We also announced Trainium 3 to really turbocharge the next generation of training and inference.”
— Matt Garman

Nova Forge

The more technically distinctive announcement is Nova Forge. Standard enterprise AI customization means taking a finished pre-trained model and applying fine-tuning or reinforcement learning afterward. Garman argues that approach has a hard ceiling: too much post-training causes models to forget early reasoning and lose core intelligence, a problem he describes as still unsolved in that paradigm.

Nova Forge sidesteps it by exposing pre-training checkpoints — at 60% or 80% completion — so enterprises can inject their own data directly into the training process, mixed with Amazon's data via API, before training finishes. The output is a base model that intrinsically understands the company's corpus, on top of which fine-tuning and RL can then be layered. Garman claims this is the first time pre-training checkpoints have been exposed to enterprise customers in this way.

On accessibility, he is measured. The near-term user is an ML-leaning software engineer, not a non-technical operator, and some enterprises will need consulting support. The goal is to get there without requiring full frontier model pre-training expertise.

Trainium 3 and the open ecosystem

Trainium 3 is positioned as the cost-performance engine underneath the whole stack — chips optimized for the models AWS builds, running the agents it sells. Garman frames the full-stack optimization as a deliberate strategic choice rather than a hedge. AWS will continue to offer Nvidia GPUs, but the internal direction is clearly toward Trainium.

On software ecosystem openness, AWS's Neuron SDK is already open source and supports PyTorch and other standard frameworks. Garman draws an implicit contrast with Google's TPU ecosystem, where open-source availability remains more ambiguous.

Supply constraints

AWS added 3.8 gigawatts of data center capacity in the past year. Garman says demand for AI compute still exceeds supply, with chips, networking, power, and physical data center space all acting as simultaneous constraints. The pace of capacity addition suggests the bottleneck is not investment appetite but the physical and logistical limits of scaling infrastructure this quickly.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

AWS launches Amazon Bedrock Managed Agents with OpenAI integration, targeting enterprise agentic workflows

Apr 28, 2026

Amazon Q4 earnings: AWS growth misses, $100B AI CapEx planned, but no coherent AI narrative

Feb 7, 2025

Andy Jassy's shareholder letter makes the case for Amazon's AI bet and $200B CapEx cycle

Apr 9, 2026