WebAI CEO David Stout: on-device AI beats cloud models in knowledge retrieval by 7%, announces upcoming fundraise
Aug 15, 2025 with David Stout
Key Points
- WebAI's on-device knowledge retrieval system outperforms Claude Opus 4 and GPT-5 by 7%, exploiting local RAM capacity unavailable to cloud providers serving millions of concurrent users.
- The company plans an imminent fundraise and embeds engineers with Fortune 100 clients, charging base license fees plus per-answer usage fees while keeping sensitive data like health diagnostics off-network.
- CEO David Stout argues transformer architectures may not survive long-term, claims GPT-5 is a cost-routing tool rather than foundational innovation, and says pre-training at scale is already declining in relevance.
Summary
Read full transcript →WebAI CEO David Stout claims the company's new knowledge graph mechanism outperforms the best available models in knowledge retrieval by 7%, with that performance running entirely on consumer hardware such as a laptop rather than cloud infrastructure. The benchmark comparison explicitly includes Claude Opus 4 and GPT-5.
The accuracy gain is structurally tied to WebAI's on-device architecture. Because inference runs locally, the system can consume more RAM per query than a cloud provider serving millions of concurrent users on shared NVIDIA hardware can afford. Stout frames this as a genuine arbitrage unavailable to Anthropic or OpenAI at scale.
A fundraise is imminent. Stout declined to disclose terms or timing but confirmed an announcement is coming, stopping short only because details have not been cleared for public release.
“David Stout: 'We announced our new knowledge graph mechanism which out-benchmarked all of the best models year to date by 7%... When we say we out-benchmark Opus 4 or GPT-5 in knowledge retrieval, that's happening on a laptop.' On economics: 'We're running some of the world's largest models today on things like a laptop... we believe it's gonna be a big step change in unit economics for AI — it's just not there in the cloud model.' On fundraise: 'No fundraising announcement today... soon. I can't leak it today.'”
Technology Stack
WebAI owns its full stack, including a proprietary runtime engine, AI library, and network protocol, built originally around computer vision and YOLO-class models starting around 2016. The company is not a wrapper business.
A key efficiency technology is EWQ (Elastic Weight Quantization), released as an open-source paper and since expanded. Unlike fixed-precision quantization that uniformly compresses a model to 4-bit or 16-bit, EWQ profiles each device's hardware on first contact and applies dynamic, real-time quantization at inference time. The result, per Stout, is a 30–40% reduction in RAM footprint while retaining accuracy, enabling larger models to run on constrained hardware with lower energy draw.
Business Model
WebAI charges a base license fee as a floor, then layers per-answer usage fees on top, collected through its own network even when inference runs on-device. Forward-deployed engineers are embedded with enterprise clients, particularly Fortune 100 companies that lack internal AI talent. Use cases cited include multimodal engine reassembly diagnostics, health diagnostics, and public sector work. A current integration is with Oura Ring, where health data stays on-device by design.
Privacy and Edge Architecture
WebAI's stack is downstream-only, meaning no data is transmitted back through its network. For health and mission-critical applications, Stout sees personalized models that never leave a user's device as the target state.
Views on the Broader Market
Stout characterizes GPT-5 as a mixture-of-experts router rather than a new foundational model, arguing it functions primarily as dynamic price control by routing queries to cheaper or more expensive underlying models depending on complexity. He notes anecdotal switching from GPT-5 to Claude among non-technical users, attributing it to response-quality inconsistency driven by opaque model selection.
On hardware strategy, Stout agrees more RAM is directionally correct for on-device AI but cautions against large infrastructure bets until the winning model architecture is clearer. He argues that transformer-based architectures may not be the long-term answer and that WebAI is actively developing alternative architectures for both public and private sector clients that he believes show material improvements over transformers. Pre-training at scale, he notes, is already declining in relevance, which reduces NVIDIA's centrality to the next phase of AI development.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.