Tae Kim: NVIDIA's inference wave is just beginning — Groq acquisition, TSMC wafer wars, and the coming CPU shortage
Mar 30, 2026 with Tae Kim
Key Points
- NVIDIA's Groq acquisition signals Huang's bet that AI agents will drive a massive CPU shortage over three to five years, as orchestration and tool calls demand four times more cores than prior infrastructure.
- NVIDIA holds the strongest TSMC position among all chipmakers, with Jensen visiting five to six times yearly and the ability to prepay billions, while hyperscalers compete for scarce leading-edge wafer capacity.
- Vertical AI agents across enterprise functions are multiplying token demand far beyond code generation, collapsing manual data-collection work and enabling deeper research per question at scale.
Summary
Tae Kim, analyst and author of the Key Context Substack, makes the case that NVIDIA is better positioned for the current AI cycle than most investors appreciate — and that the biggest underappreciated constraint ahead is not GPUs but CPUs.
Groq acquisition and the inference stack
NVIDIA's acquisition of Groq's assets and team is the clearest signal of where Jensen Huang sees the market moving. Kim frames it as a deliberate complement to Vera Rubin: roughly 75% of inference workloads run on Vera Rubin, while Groq handles the low-latency 25% that needs a different architecture. The combination lets NVIDIA serve the full inference demand curve economically, rather than ceding any slice to ASICs or custom silicon.
The Groq move echoes the Mellanox acquisition in 2019, when Huang saw the shift toward 100,000-GPU clusters before most of the industry did and bought the networking layer to go with it. The pattern is the same: identify where the economic value is migrating, then own the adjacent infrastructure before it becomes obvious.
TSMC supply and the wafer war
On the supply side, Kim is direct that NVIDIA holds the strongest position at TSMC — Jensen visits five or six times a year, and NVIDIA can prepay tens of billions of dollars to secure allocations. The broader industry is not as well placed. Google wants more TPU wafers. Every hyperscaler with a custom ASIC is competing for the same leading-edge capacity. Samsung and Intel are the only realistic alternatives, and Kim suggests Apple and NVIDIA may already be considering them for lower-end products — mid-range iPhones and consumer gaming GPUs — to free up TSMC capacity for flagship silicon.
For a TerraFab-style domestic fab ambition, Kim is skeptical. Semiconductor manufacturing is decades of accumulated trial and error, closer to cooking than engineering, and the workforce problem compounds it. The best process engineers and technicians are in Taiwan, view their work as a national mission, and are unlikely to be poached at any price — a harder talent dynamic than even the fiercely competitive AI research market.
The CPU shortage no one is pricing in
The sharpest near-term call Kim makes is on CPUs. Dell, AMD, and Intel's CFO have all recently cited three-to-five year locked supply contracts from hyperscalers — a signal that has gone largely unnoticed. The ARM CEO disclosed that AI infrastructure now requires four times more CPU cores per unit than last year's model. The driver is AI agents: orchestration, tool calls, database queries, and web searches all run on CPUs, not GPUs. As agent workloads scale, so does CPU demand. Kim treats this as a major multi-year trend that the market has not yet priced.
Token demand beyond code generation
On where the next step-change in token consumption comes from, Kim argues code generation is still early innings — enterprises running ten or twenty agents simultaneously are just getting started. The bigger wave is vertical AI agents across every enterprise function: customer service, research, chip design simulation, drug discovery. He points to a post by Logan Bartlett framing this as an attack on the $6 trillion knowledge economy.
His concrete illustration of token multiplication: a year ago, pulling same-store sales data for six fast-casual restaurant chains took him one to two hours of manual IR website work, with chatbots getting it wrong. A few weeks ago, Gemini and ChatGPT both nailed it in one to two minutes. The tedious data-collection layer is collapsing, which frees analysts to operate at a higher level — but also multiplies the research depth that becomes economically viable per question, which in turn drives token demand even when the final output is a single number.
GPU depreciation and the bubble question
Kim dismisses near-term GPU depreciation risk. CoreWeave reports H100s lasting five to six years at roughly 95% of original pricing, and rental prices for hardware that is six years old are still sold out. A compute glut is theoretically possible if AI spending proves to be a bubble, but Kim does not think it is, and current demand data supports him.
Helium
Helium supply is a watch item but not an immediate threat. Kim says there is six to nine months of inventory in the channel — Bernstein's estimate — and the risk only becomes real if Middle East tensions persist for several months. His framing on the downside scenario: if helium genuinely becomes the binding constraint, there are larger geopolitical problems to worry about first.
Meta
Kim's read on Meta is structurally bullish and unchanged by the AI spending cycle. No one is replacing Instagram or Facebook in the digital ad stack, and AI may actually improve Meta's relative position by eroding Google's search-based ad revenue. The frontier model spend — potentially $70–80 billion — may prove wasteful, just as Reality Labs has been, but the core ad engine is insulated. Any compute built for training runs that don't reach the frontier still gets recycled into ad targeting, Reels recommendations, and monetization tooling by 2028–2029. The business that generates the cash is not touched by the side quest.