Baseten raises $150M to power AI inference for companies running custom fine-tuned models
Sep 5, 2025 with Tuhin Srivastava
Key Points
- Baseten raises $150 million to serve customers running custom fine-tuned models, positioning itself as the infrastructure layer between applications and GPU capacity for companies like Notion, Gamma, and Bland.
- The company's defensibility rests on model fragmentation: customers train proprietary variants tailored to their use cases, making the differentiation sit in the application layer rather than commodity compute routing.
- Srivastava expects price pressure on token costs to drive higher inference volumes overall, invoking Jevons paradox to argue that cheaper inference cycles back into more spending within months.
Summary
Baseten, a six-year-old AI inference infrastructure company, has raised $150 million. CEO Tuhin Srivastava describes the business as the layer between AI applications and GPU capacity — acquiring compute, optimizing models it didn't train, and scaling them gracefully as user demand spikes. Customers include Bland, Gamma, Clay, Notion, and Open Evidence.
The moat argument turns on model fragmentation. The standard worry about inference infrastructure — that OpenRouter-style commodity routing will compress margins as token prices fall — assumes everyone runs the same models. Srivastava says most Baseten customers run fine-tuned variants custom to their use case. Open Evidence is his clearest example: the company trains its own models to answer clinical queries from doctors, runs them at scale, and does it with a two-person infrastructure team by outsourcing the compute layer to Baseten. The differentiation sits in the application, not the plumbing, and that logic is what keeps customers from building it themselves.
On token pricing, Srivastava is relaxed. He acknowledges inference will get cheaper and invokes Jevons paradox — noting that every time Baseten lowers prices or optimizes a customer's models, that customer is spending more again within four months. He expects the same dynamic to hold industry-wide: cheaper inference drives more inference, and the total market grows.
Headcount is around 40, up from roughly 30 a year ago. The $150 million goes toward two things: building out a go-to-market team and hiring the expensive engineers required to stay competitive on model optimization and scaling. Srivastava frames this as a land-grab moment — the market arrived faster than the team anticipated, and the capital is about moving as quickly as possible to capture it.