Interview

Cua: Computer use AI agents wrapped in a full OS sandbox — infrastructure play betting 20% of AI agent scenarios will rely on browser/computer tools

Jun 11, 2025 with Franchesco & Aleandro

Key Points

Cua builds AI agents that operate via screenshot interpretation inside an OS sandbox, betting that 20% of multi-agent workflows in five years will require browser and desktop tools rather than APIs.
The YC W25 company's open-source framework has 8,000 GitHub stars and converts users into cloud customers; its next revenue layer targets becoming an inference provider for computer-vision UI models.
Cua uses hybrid execution combining deterministic RPA for predictable paths with AI inference only when workflows deviate, addressing the gap between prototype reliability and production-ready agents.

Cua: Computer use AI agents wrapped in a full OS sandbox — infrastructure play betting 20% of AI agent scenarios will rely on browser/computer tools

Summary

Cua, a YC W25 company founded by Franchesco and Aleandro (CEO), builds computer-use AI agents wrapped inside a full OS sandbox, essentially a Docker container for AI agents. Instead of parsing HTML or reading accessibility trees, Cua uses screenshot-based, pixel-level models to interpret and interact with the screen. Franchesco, who spent over five years on the Windows team at Microsoft working on agent benchmarks, says research backs this approach as more reliable for operating system environments where there is no DOM to parse.

Cua's founders believe that in five years, roughly 80% of multi-agent workflows will run via API, but the remaining 20% will require browser and computer tools. That 20% is their market. Franchesco estimates computer-use agents are still about six months from a "ChatGPT moment," while browser-use agents are already there.

“We are building computer use AI agents — AI agents that can solve any problems like a human would do in terms of clicking, typing, scrolling. We wrap an entire operating system in an isolated environment, kind of like a Docker for AI agents. Our leap of faith is that five years from now, most multi-agent systems will rely on APIs for maybe 80% of scenarios and the other 20% will be based on browser and computer tools.”
— Franchesco & Aleandro

Open source traction

Cua has an open-source framework with over 8,000 GitHub stars. The go-to-market strategy flows from that. Inbound users ask how to productionize locally-working workflows at scale, and Cua charges on compute for cloud deployment. The next revenue layer is becoming an LLM inference provider for computer-vision UI models. Franchesco argues that OpenAI and Anthropic dominate perception mainly through PR. Baidu's model already outperforms both on computer-use benchmarks, but those alternatives are hard to discover and set up. Cua wants to be the catalog and platform for that model ecosystem.

Vertical strategy

Cua has deliberately avoided picking a vertical. Early customers arrived with highly idiosyncratic requests—contractors editing video on macOS, for instance—and no common workflow pattern emerged. The founders are letting customers chase verticals for them while Cua stays horizontal.

Episodic memory and reliability

A workflow that works 80% of the time isn't production-ready. Cua is building episodic memory to address this. Deterministic, RPA-style execution handles predictable paths, and full computer-use inference kicks in only when a webpage changes or the agent deviates from the expected trajectory. The hybrid architecture treats legacy UI automation and AI inference as complementary rather than competing.

Benchmarks

Franchesco is skeptical of current computer-use benchmarks. WindowsArena, derived from OSWorld and developed partly during his time at Microsoft, assigns tasks like opening VLC or using LibreOffice—software few users rely on today. He expects the next generation of benchmarks to measure real-world tasks, potentially including qualitative human evaluation similar to LM Arena, where human raters watch two agents complete tasks side by side and judge on fluency and coherence, not just task completion.

Both founders moved to San Francisco three months ago specifically for YC. Aleandro says the dream of doing YC dates back to when he was 16.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Redis CEO Rowan Trollope on building the context engine for the agent era

May 18, 2026

Satya Nadella live on TBPN: the OpenAI bet, AGI definitions, and Microsoft as a platform company

Oct 28, 2025

Softinn is building foundation models for computer use, starting with synthetic data pipelines

Sep 18, 2025