Commentary

DeepSeek's real cost and the new AI reasoning paradigm, explained

Jan 31, 2025

Key Points

  • DeepSeek's actual infrastructure costs run $500 million to $1 billion, not the $6 million figure widely cited, because the lower number excluded R&D, hardware depreciation, and repeated experiments.
  • DeepSeek's reasoning models caught up to OpenAI's o1 in five months by exploiting a structural shift: the reasoning era relies on cheaper post-training and synthetic data rather than expensive pre-training runs.
  • Efficiency gains in AI don't reduce GPU demand; they accelerate it, meaning DeepSeek's cost advantages spur larger model investments rather than lower overall compute spending.

Summary

DeepSeek's actual infrastructure costs run roughly $500 million to $1 billion, not the widely circulated $6 million figure for training V3 alone. That lower number excluded R&D, hardware depreciation, and repeated experiments—the real cost structure is far heavier.

The Chinese AI lab, backed by quantitative hedge fund High Flyer, has deployed an estimated 50,000 Hopper-class GPUs across variations (H100, H800, H20) after spending over half a billion on GPUs since 2021. Operational costs alone stand at roughly $944 million, mostly R&D and salaries. The lab employs about 50 staff making around $130,000 annually, plus a growing team operating more like an unfettered research lab than a grinding startup—run whatever experiments you want, unlimited GPU access, no cost constraints.

The talent play is aggressive: DeepSeek aggressively recruits top Chinese university students and operates without non-Chinese employees, which has drawn some DEI-related pushback.

The reasoning shift changes the game. DeepSeek's R1 caught up to OpenAI's o1—announced just five months prior—by exploiting a structural advantage in the new paradigm. Unlike the previous era, which depended on expensive pre-training runs, the reasoning era relies on post-training synthetic data and reinforcement learning. This dramatically lowers the barrier to entry. Smaller compute budgets can generate meaningful gains faster. DeepSeek was able to replicate o1's methods quickly partly because it appears to have trained on data from OpenAI's models, cloning outputs against a fixed benchmark.

But the benchmarks mask real weaknesses. R1 doesn't mention benchmarks it doesn't lead on. In side-by-side tests, R1 underperforms o1 on tasks outside of math and reasoning evals—one user asked both models to summarize a book in 5,000 words. O1 delivered exactly that; R1 gave 1,000. It failed the functional test. OpenAI's o3, not yet fully released, scores significantly higher on advanced mathematics and frontier benchmarks than both R1 and o1.

Google's Gemini Flash 2.0 Thinking, released a month before R1, reportedly beats R1 on unreported benchmarks and costs less through the API. It barely registered in the hype cycle. The difference isn't product quality—it's go-to-market execution. Gemini is buried in Google's product suite under confusing names ("Flash Thinking 2.0"?) and hidden behind UI friction. ChatGPT, by contrast, sits in a beloved consumer app with saved conversation history and one-click sharing. Most users won't abandon ChatGPT for marginal capability gains if it costs friction.

DeepSeek's real innovations: Multi-token prediction (MTP) predicts multiple tokens simultaneously rather than one, reducing inference cost. Multi-head latent attention (MLA) cuts KV cache memory by 93.3%, directly lowering inference costs. Mixture of experts (MoE) architecture uses specialized sub-networks with advanced gating networks that route tokens efficiently without degrading performance. These aren't all novel—MoE was described when GPT-4 launched, and FP8 training has been standard at leading labs—but the combination and execution are solid.

The export control angle. High Flyer obtained H100s before the chip ban, then pivoted to H800s (same compute, less bandwidth, but workarounds exist) and H20s through Singapore loopholes. Future US bans may come. But there's a geopolitical paradox: the US government restricts what NVIDIA can sell to China, yet allows any sales at all. Unwinding 30 years of "go global" policy is politically fraught. The steel man: the best time to deleverage from China would have been during the 2023–2024 AI boom, when demand from US companies (Sam Altman alone may need two million GPUs) was so extreme that NVIDIA could have walked away from Chinese revenue without collapsing. It didn't.

China announced a $140 billion AI subsidy from the Bank of China after meeting DeepSeek's founder—roughly a domestic mirror to Trump's Stargate push. If China can get SMIC and domestically-produced GPUs competitive at scale, the constraint vanishes. They're historically world-class at copying hardware and scaling production.

The Jevons Paradox dominates. Efficiency gains don't reduce investment; they spur it. DeepSeek's R1 costs less to run than o1, but that doesn't mean fewer GPUs ship. Dario Amodei at Anthropic notes that the economic benefits of more capable AI models are so substantial that any cost savings get reinvested into larger models. H100 and H200 spot market prices have already been pressured upward as demand increased—exactly what Jevons predicts. Cheaper models induce more usage, not less.

The competitive dynamic is real but overstated. DeepSeek beat Meta on open-weights reasoning (Meta's Llama hasn't released a reasoning model yet), but the app layer still matters. OpenAI stays roughly six months ahead despite the hype cycle, and consumer switching costs are sticky. Mark Zuckerberg claims Llama4 will blow DeepSeek out of the water—he has to, given Meta's $60 billion annual capex spend and executive compensation tied to results.

The takeaway: DeepSeek proves the reasoning paradigm is faster and cheaper than the pre-training one. That's genuinely impressive for a Chinese lab. But the narrative of "free beats paid" misses that ChatGPT is a beloved product, not just a model. The bigger story is that efficiency doesn't kill compute demand—it accelerates it. Export controls will tighten, but hardware scaling will continue, benefiting chip providers like NVIDIA. The real question is whether China can make its own competitive GPUs before controls choke the supply chain entirely.