Decart raises new funding and launches DOS 2.0, a 5-8x faster inference engine powering real-time video and agentic AI
Key Points
- Decart launches DOS 2.0, an inference engine claiming 5-8x faster performance than competitors, addressing acute chip scarcity forcing AI companies to squeeze more compute from existing hardware.
- DOS 2.0 runs real-time video at full HD 100fps and processes agentic workloads above 500 tokens per second across Nvidia, Google TPU, and Amazon Trainium architectures.
- Decart raises new funding from Radical Ventures alongside commercial traction in video streaming deployments on Twitch and TikTok Live, with founder targeting $1 billion ARR.
Summary
Read full transcript →Decart launches DOS 2.0 and raises new round
Dean Leitersdorf's Decart is announcing a funding round backed by Radical Ventures — amount undisclosed — alongside the launch of DOS 2.0, an inference engine the company claims runs 5 to 8x faster than competing solutions for the model types it targets.
Three product lines, one underlying engine
Decart operates across three distinct products. Lucy is a real-time video world model for live streaming, e-commerce, and social platforms, with active deployments on Twitch, TikTok Live, YouTube Live, and with Amazon for virtual try-on. Oasis is a real-time world model aimed at physical AI, covering robotics, autonomous vehicles, and manufacturing. DOS is the inference engine that powers both, and the one generating the most immediate commercial traction.
DOS was actually Decart's first commercialized product. Leitersdorf says the company closed its first multimillion-dollar license deal for DOS 1.0 less than 100 days after founding. DOS 2.0 was originally scheduled for August but was pulled forward because customers are running into severe compute constraints, with Leitersdorf describing chip capacity as essentially unavailable until 2028. Squeezing more performance out of existing hardware has become the only viable path to revenue growth for AI companies operating in that environment.
What DOS 2.0 actually does
DOS 2.0 runs on Nvidia, Google TPU, and Amazon Trainium — Leitersdorf describes it as the only inference stack supporting all three for all model types, including LLMs, video models, audio models, and agentic workloads. The team writes assembly directly for each chip architecture: SASS and PTX for Nvidia, VLIW for TPUs, and native assembly for Trainium.
Two headline performance numbers: DOS 2.0 can now run real-time video models at full HD up to 100 frames per second, and for fast text and agentic models it can process above 500 tokens per second, which Leitersdorf says is more than 10x the industry baseline.
Agentic workloads as the second growth vector
Beyond live video, Leitersdorf points to agentic and coding model workloads as the other place where raw inference speed becomes commercially important. The logic is that coding agents and multi-step agentic pipelines are latency-sensitive in ways that a simple chatbot query isn't — 500-plus tokens per second starts to matter there in a way it doesn't for a history question.
Consumer signals
On the consumer side, Decart's delulu.ai product, which plugs directly into OBS to apply real-time AI filters to live video streams, has seen streamers run sessions of eight hours continuously. Leitersdorf says a subscription service launched there has been growing exponentially over the past month and a half.
Leitersdorf's self-imposed milestone: cut his hair when Decart hits $1 billion ARR. He says the bet was made early in the company's history — Decart is roughly a year and a half old — and adds that with DOS scaling the way it is, that moment may arrive sooner than the original timeline assumed.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.