Sail Research raises to bring maximum inference efficiency to open-source models, starting with GLM-5.2
Jun 29, 2026 with Neil Movva
Key Points
- Sail Research raised a Series A from Kleiner Perkins and a seed from Sequoia Capital, with Intel CEO Lip Bu Tan participating as an angel investor.
- Sail optimizes inference efficiency for open-source models across the full software stack, starting with GLM-5.2, the first open-source model founder Neil Movva recommends for coding workloads.
- Movva expects background-agent token consumption to flip from 20% today to the majority this year as agentic reliability crosses a threshold for deterministic workflows like fraud checks.
Summary
Read full transcript →Sail Research is building inference optimization software for open-source AI models, covering the full stack from chips to API. The company doesn't make silicon — it buys GPUs and squeezes maximum efficiency out of everything above the hardware layer.
Neil Movva, co-founder and CEO, spent roughly a decade at Nvidia and Apple before founding Sail. The Nvidia stint is relevant context: Movva says he was skeptical when Jensen Huang was pitching AI at a company doing $5 billion in gaming revenue.
“We are a company building the most efficient inference in the world. We love GPUs. We dig deep into the stack to find efficiency everywhere and we make tokens super abundant. Today, GLM 5.2 is a big moment for us. It seems like z AI really figured out post-training with this release — the style of the model is excellent for coding. It's the first one I'd actually, with a straight face, recommend my colleagues try for coding. Today, I'd estimate it's like 80% human in the loop and 20% background — but I actually expect the crossover to happen this year where background dominates.”
GLM-5.2 and the open-source moment
Sail's immediate focus is GLM-5.2, which Movva argues marks the point where an open-source model finally cracked post-training quality. Previous releases from DeepSeek and Kimi hadn't gotten there. GLM-5.2 is the first open-source model he says he'd recommend, with a straight face, for coding workloads.
The background-agent shift
Movva estimates today's token consumption is roughly 80% human-in-the-loop, 20% background. He expects that ratio to flip this year. The logic is that agentic reliability has crossed a threshold in the last few months, making it viable to run agents continuously in deterministic workflows — fraud checks, booking flows, and similar background tasks — rather than waiting for a human to prompt them. If that crossover happens, the addressable volume becomes, in his words, "trillions of tokens per task."
Sail's pitch isn't cost reduction in the traditional sense. Movva says he doesn't want to save customers money — he wants them spending more with Sail because the economics are good enough to justify far greater usage. Cheaper tokens expand the problem space rather than just trimming the bill.
Funding
Sail raised a seed led by Sequoia (Constantine and Lauren Reeder) and a Series A led by Kleiner Perkins (Adithya Naginath). No amounts were disclosed. Lip Bu Tan, Intel CEO, participated as an angel, introduced through Sequoia's Constantine.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.