WebAI CEO David Stout: on-device AI beats cloud models in knowledge retrieval by 7%, announces upcoming fundraise

Aug 15, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

headaches of out of home advertising. Only adqu combines technology, out of home expertise and data to enable efficient, seamless ad buying across the globe. And we do have our next guest here. David, welcome to the stream. How you doing? Welcome to the show. Thank you so much. What you got for us? Jord's warming up.

He's got the mallet ready. He wants to hit the gong. It's the first one of the stream. You got some good news for us? Yeah. Yeah. Um uh yeah, web AI. Uh we're working on some pretty interesting things.

Just recently, uh we announced our new knowledge graph mechanism, um which is outbarked all of the best models year to date. Um, by by how much? Uh, 7% by 7%. Let's go. There we go. We like to hit the gong for big numbers. We like to hit the gong for big fundraises. We also like to hit the gong for for improvements.

Benchmark maxing. Um, yeah. No, no fundraising announcement today. Soon soon. I can't I can't leak it today. Well, we'll be refreshing our rocks account. Uh but in fact talk talk more about the the genesis of the business why why you started it and um yeah what got you guys here. Yeah. Yeah. Absolutely.

So webi is really focused on building models that can live on devices like the ones on your desk. Right. So um the genesis the company really started working like like on a watch or Yeah. Absolutely. Yeah. Absolutely. Um all of it.

Um yes uh so company started by working in computer vision and we were working on how could we take um like the YOLO models if you guys are familiar with those um uh it was in the early like 2016 era um these were like the biggest models because language models weren't really mature yet and um did early work there ended up creating our own runtime engine so our own AI library and our own network protocol and what this enabled is us to run state-of-the-art AI models across devices distributed So when you think about the future oftelligence and what we're building is the rails for that.

So we serve and distribute models across hardware. So we're running some of the world's largest models today on things like a laptop. So when we say we outbenchmark like Opus 4 or GP5 in knowledge retrieval, that's happening on a laptop.

So it's not like uh it's a pretty significant breakthrough in modeling and we're doing this um in lots of different industries but uh what we believe is going to be a big step change in you know unit economics for AI as well um that's just not there in the cloud model uh seems very important because all all week we've been talking about uh gross margins or the the lack thereof in uh a bunch you know a bunch of these different applications free when it happens on device Right.

That's the goal. Yeah. And you can do some things that uh cloud players can't do. Right. So part of the way we're getting this accuracy like there's always like this no free lunch, right? So why wouldn't you know Anthropic do what we're doing to get this huge accuracy retrieval uh bump? Well, it's RAM intensive.

So if we're distributing across devices, uh we can arbitrage, right? So we can say, okay, we'll pull more RAM because we're inferring on a device. But if you're hosting this for a million users um on Nvidia um you can't do that. You can't load you know additional RAM resource for every user.

Um uh it's just not efficient. Um but there's there's there's real things that happen on the edge um that unlock I think technological paradigms in AI that are more meaningful like more accuracy, more context, all of that. And we're seeing what about what about privacy too? Absolutely. Right.

So in our stack everything's downstream only. So when we partner with a group like we work with the Aura ring if you know that company we're doing the AI for them and uh think about like health data like you want that to be private.

So the dream there is how can we facilitate personalized models for millions of users that never leave their device. Um react to this post from Tay Kim, author of the NVIDIA way. He says here's what I would do if I was the CEO of Apple. Quadruple the RAM in iPhones to 32 gigs. Have the Max model at 64 gigs.

Memory is oxygen for local ondevice AI. More equals smarter and more powerful. Take the margin hit. Memory isn't even that expensive. What do you think? I think I think memory I think he's right. I think memory is fundamental in these models.

Um, I also think we need to tread lightly on this idea that we're retooling infrastructure and we're making all these big bets on hardware with, frankly, a pretty immature algorithm. Transformers are are not necessarily the winning algorithm. So, I think we need to be, you know, cautiously optimistic.

Um, but we need to continue to work on what's next. Like I just you retool based on all these factors and an algorithm changes. Um, and we don't know what the long tale of hardware is going to look like.

And Nvidia was really relevant because pre-training and all this, but now pre-training isn't really happening at the same level it used to. And um, I think generally more RAM is a safe decision.

But, uh, also I don't know if I would jump in and like totally rewrite how we're building chips until we know that this is the architecture we want to stick with. Would you recommend someone buying a new Mac max it out and get the most memory possible? Yeah, absolutely. Absolutely. Everything just Yeah, why not?

Why not? Um, what about diffusion models? Do you think that there's a chance that they have a comeback? We saw that demo from Google where they were doing uh text uh like token generation through a diffusion model felt kind of like a wildcard scenario.

I don't know it's actually performing on benchmarks but seemed like uh a path a path in the tech tree that was kind of you know more or less forgotten relegated to image generation but then kind of making a comeback maybe I think I think there's lots of things that have been unexplored relatively speaking we spent so much time on transformers but we haven't spent equivalent amount of energy and dollars on other architectures that we know work um and we know they work at specific things but there's there's typically a broader application I think it's really interesting Um I mean we're working on new architectures today um with with both like the public sector as well as the private sector and we're seeing a lot of breakthroughs that I think um make the transformer look a little old.

Oh interesting. Uh how do you think about uh the business model here? Because you're not going to be selling hardware to an OEM in the supply chain but you're also not an API so you're not pricing on consumption basis.

It feels like there's a world where companies are comping you to an open-source thing that they have to implement. Like h how how like what does a great relationship with a big device manufacturer like edge computing provider look like for you? Yeah, I think I think Web AI one we have a license, right?

Because we have a proprietary tech stack. We're not a rapper. We appreciate rapper companies. We think they're doing cool things. Um but we own our stack pretty vertically. So we own our runtime, our AI library, um, and our tooling around that.

And so when we work with a a partner, we we typically structure a base license minimum. And when we have that when we have that license, um, we can, you know, inject forward deployed engineers to work with these companies that honestly just don't have the AI talent quite yet. Um, and they need help.

And I think that's something that people aren't talking about is, you know, like these products don't necessarily solve the problem out of the box. A lot of these enterprises like Fortune 100 need help. Um and uh so we do that and additionally there's a way to take part in the success in the deployment.

So the usage fees we can get um even though we're running on device. So um because our network is managing that. So you can imagine webi you have two and a half million custom devices or maybe it's an iPhone and we're shipping across that. Our network manages all of that. So we collect fees on that.

So, so, uh, it sounds like it's somewhat case by case, but you could imagine charging like a per device license, but also like a per token license in the future per answer is typically how we structure it. So, it could be a book, it could be a one-word answer. Um, as long as it's an output that's solving a problem.

Um we mostly work in mission critical use cases like things like reassembling engines with uh multimodal AI um you know health diagn diagnostics um uh public sector work. Yeah. How quickly are you going to kill Jord's battery if you're doing test time inference on device?

uh he's been already complaining about the iPhone not having enough uh battery life, but uh it feels like it feels like there was a glimmer of hope when we were just like, let's just distill the models and it'll just be like a pretty pretty short uh inference chain.

But if you're even if you distill the model, if you're inferencing for 10 minutes, that feels like a lot of heat in my pocket. Well, I don't know what Jord's using, I would assume it's a pretty nice phone. Um it's like an iPhone. Yeah, it's the latest and greatest iPhone. Yeah. So I mean you mentioned quantizing.

So I'm going to talk a little bit about that and what we're doing there. Um so we released an open source paper around a tech that we were building early that we've now expanded and it's a little it's it's more sophisticated now but the principle is still there.

Um it's called EWQ and um instead of just quantizing and tell me if I'm going way too technical here. Quantizing traditionally you have like a fixed value.

So we have let's say you have a full precision model and when you quantize something you say okay I'm going to quantize it to four bit or I'm going to go to 16 bit and so you're just drastically chopping the model down right some so from the float values that it can pass through um with EWQ what we do is we have something called device profiling so when a webi model hits your uh your phone um it's running our webframe library and it profiles your hardware and then what we do is on inference we run EWQ and And what EWQ does is it does real time quantization.

So based on your question and the inference um and what it leads to is uh close to 30 to 40% model reduction size in RAM while retaining accuracy.

So what that means is we get bigger models inferring and instead of like this oneizefits-all quantization um we we dynamically do that on inference and what that leads to is less energy consumption, higher accuracy, less usage on the device. Yeah.

So somewhat similar to the model routing that we're seeing in chat GPT now. What were your overall reaction to GPT5? Um it's just ane router. Um I I was kind of hoping it was a new foundational model.

Um uh and when you interact with it, it's really clear that it's just a way to dynamically control price um based on a question. So like you ask a question, they route you to a different model. Um if it's coding, it will route you to a different model. Um, I can see where that's valuable.

Um, I have a lot of people that are non-technical that are in my life and I've watched them now switch off of GPT after the five release to things like Grock. Um, which was kind of shocking to me.

Um, but I think people were used to a certain standard of response and now the lack of like transparency and picking the model you're engaging with I think created some whiplash. But, um, I'm sure there's areas where it's amazing. I haven't really gotten to tap into everything there.

Um, been enjoying a lot of the anthropic releases and um, typically probably tend to lean that way. Cool. Uh, well, thank you so much. Congrats on all the progress and hope you have a great weekend. We'll talk back on again soon. Sounds like you. Absolutely. Yeah, we're excited. We'll talk to you soon. Yeah, absolutely.

Great to meet you, David. Thanks for thanks for joining. Let me tell you about public. com investing for those who take it seriously. They got multi-asset investing, industryleading yields. They're trusted by millions, folks. Uh, should we go soon? Um, what did Sam mean by this?

If we didn't pay for training, we'd be very profitable. We talked about this. Uh, Kristen Culver says, "Most successful coups in history. Napoleon Bonapart's coup of 18 Bomer. Uh, October Revolution in Russia 1917. Uh, the Nazi seizure of power 1933. Egyptian coup d'etata. Uh, 1952.

the Chilean coup in 1973 and the open door retail army at open in 2025. Uh Kristen worked at at openuh open door correct I believe must have and so she's having fun. Um it'll be interesting to see in other news the CEO stepped down right this morning. Yeah, this morning. Uh, so Carrie Wheeler posted on X.

Uh, today I'm stepping down as CEO of Open Door. When the board of directors asked me to take on this role at the end of 2022, the company was in crisis. The real estate market was punishing. The business needed a reset and the path forward was uncertain. My mandate was clear.

Stabilize the company and do what was necessary to survive. Of course, I said yes because I believed in open door. It wasn't easy and it wasn't about glamorous headlines, but we stopped bleeding. We restructured the business, rebuilt an exceptional leadership team, got an NPS of 80.

Uh, and she says, "I'm pleased the leadership team will continue to execute on the vision strategy. I'm closing this chapter with pride, clarity, and gratitude. " So, good luck. It is wild. Open Door is up 200 200% in the last 30 days, and they uh the retail army said, "Nah, we want more. They want more.

Are crushing it. " Well, I'm excited to see where Carrie Wheeler goes next. We should um watch the new uh Jason Carman film killer coming on the stream in just a few minutes. Let's pull up the latest work from Jason Carman. I'll be right back. Please [Music] some time ago. One, we have a liftoff. The machines roared.

The steel bent to our hands. We built for the stars, for our land, and for your future. [Music] We went fast. We went far. It made us strong. It united us. Then the sound faded. [Applause] The hunger to build drifted away. [Music] Those who knew grew tired. But now the fire returns. The steel is ready.

The country needs you. New boundaries beckon. Who will you be? Oh, that's where you take us slightly different last minute. The stars, the sound design, our future is waiting. So, will you answer that? We need one of those. When we were discussing hard attack, we

← Back to story