Interview

Baseten raises $303M Series D at $5B valuation as enterprise AI inference hits an inflection point

Jan 23, 2026 with Tuhin Srivastava

Key Points

Baseten closes $303M Series D at $5B valuation led by IVP with NVIDIA participation, posting 10x year-over-year revenue growth serving AI-native companies like Cursor and Notion.
Enterprise adoption of fine-tuned, on-premise models remains 6 to 18 months away, but the bottleneck is operational capability, not model quality, positioning Baseten as the infrastructure partner most enterprises lack in-house.
Open-source models are within one quarter of frontier performance on many benchmarks, making the economic case for proprietary inference infrastructure stronger as reinforcement-learning-tuned variants proliferate.

Baseten raises $303M Series D at $5B valuation as enterprise AI inference hits an inflection point

Summary

Baseten has closed a $303 million Series D at a $5 billion valuation, led by IVP with participation from NVIDIA. The company claims 10x revenue growth year over year, serving a client roster that includes Cursor, Notion, OpenEvidence, and Bridge — fast-growing AI-native companies that Baseten views as the next generation of Fortune 500 businesses.

What Baseten Actually Does

Baseten operates as an AI inference infrastructure platform, focused on deploying and scaling production models for enterprise and high-growth customers. The core value proposition is handling the operational complexity of running models at scale — reliability, latency optimization, and on-premise or cloud deployment — so engineering teams do not need to build that capability themselves.

“We've had a crazy year. We're 10x of last year... We've raised series D now. We've raised $303M. It's a $5B valuation led by IBP and with participation from NVIDIA and a few others... We focus on production gradient inference serving the fastest growing companies in the world — Cursor, Notion, and others.”
— Tuhin Srivastava

Enterprise Adoption: Early but Accelerating

Direct enterprise adoption of custom models remains limited within the Fortune 500, but the trajectory is clear. CEO Tuhin Srivastava estimates enterprises are 6 to 18 months away from meaningful deployment of fine-tuned, on-premise models. The near-term driver is reinforcement learning applied to open-source base models, which Baseten argues can match or exceed frontier model performance on specific tasks. Baseten recently made an acquisition in that space, though terms were not disclosed.

The more immediate barrier is not model quality — it is operational capability. Large frontier labs maintain substantial internal inference teams; most enterprises do not. That skills gap is where Baseten positions itself as the enabling partner.

The Open-Source Parity Argument

Srivastava pushes back on the assumption that frontier models always win on specialized tasks. Open-source models — specifically referencing GLMs, Qwen, and DeepSeek — are, by his account, within roughly one quarter of frontier performance on many benchmarks and ahead on some narrow tasks. As RL-tuned open-source models proliferate, the economic case for running proprietary inference infrastructure strengthens.

Hardware Strategy: CUDA-First, Chip-Agnostic in Theory

Baseten runs across H100s, A100s, B200s, and GB200s and describes itself as chip-agnostic. In practice, Srivastava is candid about NVIDIA's structural advantage. He expresses skepticism that any third party will outperform NVIDIA's own software on NVIDIA hardware at the lowest level, citing NVIDIA Dynamo — software designed to split prefill and decode workloads across GPUs and memory-bound chips like Groq's LPUs — as a key part of Baseten's inference stack. Cross-chip compilation is viewed as a useful but overhyped capability.

Inference Routing and the Application Layer

Advanced customers are already breaking workloads into model-specific routing decisions based on capability, cost, and latency requirements. Less mature customers rely on routing primarily for failover. Srivastava's view is that as the long tail of task-specific models expands, capability-based routing will increasingly be handled at the inference platform layer rather than within individual applications.

On application quality, Srivastava frames the ability to solve what he calls model "laziness" — inconsistent output quality relative to cost — as the defining differentiator between durable AI application companies and thin wrappers. Companies like OpenEvidence succeed, in his telling, precisely because they architect multi-model workflows that match the right level of compute to each user task rather than defaulting to a single general-purpose model.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Baseten raises $150M to power AI inference for companies running custom fine-tuned models

Sep 5, 2025

Fireworks AI raises $250M at $4B valuation to power application-specific inference at Google-scale token volumes

Oct 30, 2025

Granola raises $125M Series C at $1.5B valuation to become the enterprise context layer for AI agents

Mar 30, 2026