News

NVIDIA beats earnings but stock falls 6% as inference era redefines the chip landscape

Feb 27, 2025

Key Points

  • Nvidia beat earnings and guided higher, but the 6% stock drop signals the market is pricing in competitive threats rather than rewarding execution.
  • Inference, not training, now drives Nvidia's compute workload as reasoning models from OpenAI and Google arrive, and Blackwell chips show outsized gains for inference versus training.
  • Specialized inference startups like Etched and Groq chip away at use cases while hyperscalers develop custom silicon, creating structural risk Huang frames as temporary but Nvidia hedges against with Blackwell.

Summary

Nvidia beat earnings expectations and offered an optimistic forecast for the current quarter, but the stock fell 6% the next day—a reminder that the company was priced beyond perfection rather than to it.

The earnings results reflected Nvidia's success in navigating a fundamental shift in the AI infrastructure market. For years, Nvidia dominated by serving the computationally intensive work of training large language models. But as AI usage has matured, the industry has shifted toward inference—running trained models to serve queries. That transition, long expected, has now arrived with reasoning models like those from OpenAI, Google, and DeepSeek, which require a hundred times more computing power than standard inference because they generate tokens step by step. CEO Jensen Huang described reasoning as a shift toward models that "think through answers" and noted that the vast majority of Nvidia's compute today is actually inference work.

Nvidia positioned itself for this shift with Blackwell, its latest chip generation. Blackwell is larger, includes more memory, uses lower-precision arithmetic for AI workloads, and can be networked together at scale with fast interconnects. According to Dylan Patel of SemiAnalysis, Blackwell's performance gains in inference significantly exceed its gains in training. Chief financial officer Colette Kress added that many early Blackwell deployments were earmarked for inference work—a first for a new generation of Nvidia's chips.

The company brushed aside the DeepSeek threat, which rattled markets in January when the Chinese startup demonstrated sophisticated reasoning models built on fewer Nvidia chips than expected. Huang called DeepSeek's advances "excellent innovation" and implied the gains would be absorbed across the industry, but customers would still want to run models on large server farms stacked with Blackwell chips to capture all available performance improvements. Huang has previously argued that inference and training will eventually converge as AI aligns more closely with human cognition—humans learn and reference simultaneously rather than in separate phases.

The infrastructure competition is sharpening.

Nvidia faces mounting pressure in inference from startups and hyperscalers alike. Etched, an AI chip startup cofounded by two Thiel Fellows, is building inference-specific silicon with the transformer architecture baked directly into hardware. The company has demoed real-time Minecraft generation from controller input alone. Cerebras manufactures wafer-scale chips—the largest chips ever produced—and is now working with Mistral to build what it claims will be the world's fastest AI chatbot. The advantage of wafer-scale is that inference runs on a single chip with no memory bandwidth latency; the drawback is manufacturing risk: a single defect ruins the entire wafer instead of just one die among dozens.

Groq has taken a different specialized approach, optimizing for extremely fast memory bandwidth while limiting total memory allocation. The tradeoff allows Llama inference to generate tokens almost instantaneously—a stark contrast to the noticeable latency users experience with ChatGPT. SambaNova Systems and Groq are both working with Saudi Arabia's state oil company Aramco to build large inference computing facilities.

The deeper structural risk may come from hyperscalers themselves. Google has TPUs, Amazon has custom silicon, and Apple uses its own M-series chips for inference. These companies are the largest buyers of Nvidia's chips and are increasingly motivated to cut out the middleman by developing their own silicon. Jim Piazza, a former Meta infrastructure executive now at IT management firm Sono, predicted that Nvidia might need to develop inference-specific chips to compete directly, and that it may take years for the market to shift but the direction is clear. Andy Jassy recently discussed Amazon's work on quantum computing chips, signaling that hyperscalers are diversifying their custom silicon bets across multiple architectures.

Huang's framing—that inference and training will eventually converge as systems evolve toward human-like continuous learning—suggests Nvidia sees the current specialization wave as temporary. But the transcript makes clear the company is hedging hard with Blackwell rather than betting everything on that convergence thesis. Whether Blackwell's gains in inference are enough to hold market share as startups chip away at specific use cases and hyperscalers pursue independence remains the open question the stock price seems to have priced in.