Decart raises new funding and launches DOS 2.0, a 5-8x faster inference engine powering real-time video and agentic AI

May 18, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

Speaker 2: And we'll see where it goes. Well, we have our next guest, Dean from Descartes in the waiting room. Let's bring in Dean. Will there be a crazy filter? Normal mode. Back to the

Speaker 1: show. Full throw full throw a filter on.

Speaker 2: I haven't seen anyone nail that as well as you have. Well, you've been nailing lots of things. Give us the news. Tell us what's going on.

Speaker 10: So fun to be back here on TBPN. Last time we did this, we had some crazy filters.

Speaker 2: It was very psychedelic. I loved it.

Speaker 10: It was it was very psychedelic. If you're interested in newer ones, you should go on our site and try them out.

Speaker 2: It's

Speaker 10: been mind blowing.

Speaker 2: Amazing.

Speaker 10: But but today, you know, we announced a round. We had a big funding round.

Speaker 2: We raised

Speaker 10: a million dollars.

Speaker 1: Woo hoo. There we go. Nice. It's great to have you back.

Speaker 2: See? It's great to have you back.

Speaker 10: We missed It was worth doing the round just for that. We should we should do more and more rounds to get that going.

Speaker 1: Raise a dollar tomorrow. We'll have you back. No. Tell us tell us what you've been up to since the last conversation.

Speaker 10: So so, you know, today, the really exciting stuff is that we have announcements on all three of our product lines.

Speaker 1: Wow.

Speaker 10: So so we have we have three product lines at the cart. The the first two are world models. We have Lucy, that's a world model to real time video model that is used for immersive experiences. So gaming, live streaming, e commerce, ads, and then we have the new version of Lucy coming out soon, which which has been growing dramatically over the past three yeah.

Speaker 1: That's generally what you were demoing the last time you were on where you have this real time video Exactly. Of you in these sort of exotic settings. Right.

Speaker 10: Exactly. We have Lucy. Lucy can take any video stream, edit it live. So we can do either fun stuff and we've seen huge usage for that in social platforms like Twitch, TikTok live, YouTube live. And at the same time, it can also be used for for beneficial experiences. For example, e commerce and virtual try on, trying on different clothes or putting ads inside into live streams. And we've seen that, for example, with Amazon. We're using this across different ecommerce providers. So that's that's our Lucy product line and has its new version that's coming out. We have our Oasis product line which is a real time world model for physical AI, for robotics, for autonomous vehicles, drones, manufacturing, and really over there, our real time model lets AI just interact with the real world. It stops being just in the virtual world and text space and actually is real time pixels lets the AI see the real the real world in real time and interact with it. And then we have our our third product line which is DOS, the the cart optimized stack. It's our inference engine. It's basically what powers both Lucy and Oasis, and it lets us run models, all types of models, a LLM models, agentic models, video models, audio models, world models, all the types of models dramatically more efficient than anything on the market. And today, we're announcing DOS two point o. That's already being used by some

Speaker 1: Hit it again.

Speaker 2: I think I think I caught Ben Lachin. Over there. He gave him a heart attack.

Speaker 1: When did when did you release DOS one point o? You you you've realized at some point, hey, we're cooking pretty hard over here. Maybe we should let other people use it. Feels, you know, pretty aligned with with the other products. But but yeah. How did you get into it?

Speaker 10: So I think that's a great question. Actually, you we we don't talk about DOS too much, but DOS was actually the first product we commercialized.

Speaker 1: Mhmm. When the

Speaker 10: company was just three months old, we closed the first multimillion dollar license deal for DOS.

Speaker 9: Overnight success.

Speaker 10: Literally. Literally three months in. Was less than a hundred days. Fantastic.

Speaker 1: Why did it take why did it take you

Speaker 2: so long?

Speaker 1: Days. Why did it take you so long?

Speaker 10: That's, you know, that's the number one question I ask my team literally every single day. Okay? Number one rule for running an AI company, if you're an AI CEO, whenever your team comes to you with a deadline, ask why not 10 times shorter.

Speaker 2: Mhmm.

Speaker 10: Okay? But but yeah. We you know, to go back to your question, DOS one point o was the first product we ever had at the card. We licensed it back to the Neo Clouds back then and to some of the younger AI labs. Now DOS two point o is being used by by all the players including the tier one players as well and the hyperscalers to to really use compute much more efficiently. And for the models that we support, really focus on very fast models. So either agentic models or live video models. For those models, we're anywhere between five to eight x more performance than anything on the market.

Speaker 2: Okay.

Speaker 1: Is focus overrated? No. It's either you're doing a lot. You're competing. You're fighting. You're fighting, you know, fighting on, you know, three different fronts, but clearly doing doing a great job at it. How do you how do you make it how do you make it work? I could imagine any one of these opportunities being, you know, big enough at some point to warrant kind of going all in on it.

Speaker 10: Well, we're all in on them, on all three of them. Now, the nice thing is that it really I think I think focus is very, very, very important. And you have to build inside the company very independent leaders. We have a lot of very, very talented researchers that turned into very independent leaders inside the company. So they're both great on the technical side and very, very good on productization, on taking this to market, on talking to customers, on building the product itself. And and we inside the car, we really have three different teams. One for Lucy, one for Oasis, one for DOS, and they each operate completely independently and only focused on the thing that they're doing. Now with DOS, the reason the reason we we accelerated DOS two point o was supposed to come out in August, we're launching it now instead, It's because of the huge, huge, huge, huge supply constraint on the chip side. It's it's just become we're hearing this from all our customers that there's no capacity left basically till 2028.

Speaker 2: And

Speaker 10: and so getting more performance out of chips is the only way to actually grow your revenue and and to and to grow your AI adoption. So if you're any AI company, you really have to be able to extract the most out of any possible chip to be able to actually grow your business. And right now, that is a bottleneck.

Speaker 2: Yeah. How how tightly linked are the different products? Because when I think of Lucy real time interactive video world models, I think like optimization there is what you're, a, good at, but also incredibly important because even the demos that we've seen, they're not four ks. They're not 60 FPS. There's clearly room to run there. Whereas in many of like the text generation models for a lot of the queries that people are asking, how do I cook this? You know, tell me the history of this company or story. Like, it's basically superhuman already, but superhuman real time world world models. Like, we're not there yet and so optimization feels really important. How how tightly linked are those two projects?

Speaker 10: Yes. They're very tightly linked through DOS. Yeah. And DOS two point o today, it can run real time video models

Speaker 6: Sure.

Speaker 10: At full HD for the first time Okay. Up to a 100 frames per second.

Speaker 2: Wow.

Speaker 10: Okay? Yeah. So that's huge breakthroughs there. Yeah. And on the tech side, what DOS can do so DOS runs on all the three major chips. It runs on NVIDIA, on Google TPU, and Amazon Trainium.

Speaker 2: Yeah.

Speaker 10: It's it's it's the only the only stack that really supports all three for all the different types of models. And on the AgenSys side

Speaker 2: So for AMD?

Speaker 10: It's So over. The the the chips the chip space is incredibly incredibly interesting. We will support the force eventually. We we we will we will support everyone. We will support everyone.

Speaker 2: Yeah. Yeah.

Speaker 10: But to your question about fast text models Yeah. Where you really need them is agentic workloads. You really need it if if you want to be able to run, for example, coding models very, very quickly.

Speaker 2: Yeah. Yeah.

Speaker 10: And DOS two point o can, for the first time, run at above a 500 tokens per second

Speaker 6: Okay.

Speaker 10: Which is more than 10 times the industry.

Speaker 2: Interesting. What at at somewhat of a high level, technical level, what is different about the architecture of interactive video world models from text based LLMs? Like, I think most people saw the fork in the road during like the mid journey era, the DALL E era, the diffusion. You start with a bunch of noise versus token based, next token prediction. Like, have these converged? Have they diverged? Are there different requirements? Like, we're seeing with agents, we need more CPUs now. We might need more more context in cache. We might need RAG or or vector databases. Like like, what are different if you're to build out, the ultimate data center for generative interactive world models? Like, are you looking for Cerebras like chips? Are you going all in on NVL 70 twos? Like, what what is the how is there is there a difference to the shape of the of the architecture that lends itself to, different hardware constraints?

Speaker 10: Yes. I think that that's that's probably one of the best questions in this field right now because AI is moving so quickly

Speaker 2: Yeah.

Speaker 10: That it's very, hard to predict what the right infrastructure will be three months from now.

Speaker 2: Yeah.

Speaker 10: You you brought up, you know, the the CPU shortage that suddenly happened. Yeah. No one was expecting AI to need CPUs and when AI needed CPUs, it went from zero to can we get all the CPUs and all the hyperscalers today.

Speaker 2: Yeah.

Speaker 10: And and that's and that's happening overnight.

Speaker 2: Mhmm.

Speaker 10: And now it's becoming it's becoming very hard. What we're seeing, what we hear from our customers, it's becoming very hard for the people on the model side to actually understand what to do on the infrastructure side and and vice versa. And so there's this gap here of how do you map the model requirements and that they're constantly changing every single week to what's possible on the infrastructure side. And so that's why, for example, we support all three major hardwares. It really allows us to choose where to route the different workloads to. And then each one has its own unique strengths and weaknesses. Mhmm. And and that lets us really we developed a very, very deep expertise in knowing how to map the model to the chip itself.

Speaker 2: Mhmm.

Speaker 10: I think that it it ties into something else that we're seeing. You know, usually when people draw out the stack, they say, okay. There's the model layer

Speaker 2: Mhmm.

Speaker 10: Then there's software, for example, Kudo, and then there's the hardware layer.

Speaker 1: I call it a five layer cake, but

Speaker 10: It's I wonder if someone else will will adopt your five layer cake terminology.

Speaker 1: Is that Johnson? Yeah.

Speaker 2: You have

Speaker 10: the two layers above and below. You have the data center and you have the application layer.

Speaker 2: Lots of ingredients.

Speaker 10: Now, that's that's really where we sit. We integrate across all those layers inside the software side to really tie from the AI model itself directly onto the chip. Yeah. We literally write assembly for all these three chips.

Speaker 2: Sure.

Speaker 10: We know how to write v l I w for TPUs. We know how to write assembly for Trainiums. We know how to write at SAS and PTX for NVIDIA chips. Sure. And so we have all these different layers, and they really enable us to very quickly move between these workloads that constantly change.

Speaker 2: Okay. Are you seeing glimpses of consumer product opportunities in video world models? When I see your your technology, when I see Genie from Google and World Labs, I think, okay. Like, a harness, a wrapper, a couple UI, a relational database storing my inventory, like a couple other steps and all of a sudden this is something that I want to play for more than a demo for more than a minute. And maybe the hardware is not there. But I think just as, you know, lots of folks who are interacting with LLMs during like the GPT-three era sort of saw ahead and started thinking, oh, well, like chat is a potential modality here. Everyone's seeing that video games or something playable would be a potential modality. But how far away are we from that? Is that interesting? Like what else what what other dominoes need to fall for that to actually happen?

Speaker 10: So over the past month actually, we've seen huge usage for using Lucy in live streaming.

Speaker 2: Mhmm.

Speaker 10: You can go to delulu.ai.

Speaker 2: Sure. Yeah.

Speaker 10: Delulu.ai. Delulu. And you can delulu, come on. Of course. Yeah. It's good. It's good. It's good.

Speaker 1: It's good.

Speaker 10: Yeah. And it just plugs right into your OBS. Mhmm. So you can just it just literally plugs into your OBS camera and you can just apply all these filters live and we've seen streamers go on it for eight hours nonstop.

Speaker 9: Mhmm.

Speaker 10: So so we've seen that we've seen that pop really over the past month, month and a half. Have a new subscription service there that people just subscribe to and they can turn it on forever long they want, and that's just been growing exponentially fast.

Speaker 2: K. Well, thank you, sir, coming on. We actually have some videos that we're gonna play because we've been demoing it or the team has.

Speaker 1: No way. Only while we've been Can we while we've been talking.

Speaker 2: Can we play this while he's live so he can see it too? I think you'll see the program monitor if you want to hang out. But let's pull up. This is Tyler

Speaker 10: You guys are doing the live demo instead of me this time? That

Speaker 1: is insane. Yeah.

Speaker 2: Yeah. Yeah. So we have a video here. We recorded it of I believe it's Tyler as Albert Einstein. Is that correct? Let's see it. Let's see. And pull this up. Pulling it up might be the harder part Yeah.

Speaker 1: Real time video models but pulling It looks up good. A on stream

Speaker 2: The shadow and the lighting

Speaker 1: Still a challenge.

Speaker 2: Did you prompt this?

Speaker 5: So it started as Einstein and then I I went through a couple different

Speaker 2: You wanted pink tuxedo on as well? That's very funny. What a funny prompt. And the yeah. The the visual fidelity on Einstein's face. That is weird. Okay. We go. You got that. It's a very humanoid Oh, that's a jacked horse. That's that is odd. That's very odd. But the horse had oh, there you go. Okay. That's interesting. As you touch your face, like, the the hand of the horse sort of hits the correct part of the face so it understands the physics well. That that was impressive. It wasn't purely

Speaker 1: Last question. Last question before you jump. Is is there a certain milestone that if achieved you will cut your hair? Like is it in Oh, yeah.

Speaker 2: Oh, yeah?

Speaker 10: Oh, yeah.

Speaker 2: Really?

Speaker 10: It's the the the milestone is that we need to hit 1,000,000,000 ARR. Milestone. It's a it's a bet from early on in the company. Now, this this is a year and a half long. Okay? This is just one and a half years. We have to get rid of it now with with DOS and the way that's scaling. That's that's at some point, I'm gonna get a haircut.

Speaker 1: Fantastic. Amazing.

Speaker 2: Well Amazing. We'll be here when you hit that milestone.

Speaker 1: Selfishly, I kind of cut your hair

Speaker 2: on the stream.

Speaker 1: Your waist. Right? Oh,

Speaker 10: we should we should do a haircut on stream. We're gonna do a haircut on stream. Love that.

Speaker 2: Come to the El Trio. We'll shave your head.

Speaker 1: Dean, you're the man.

Speaker 2: This is great.

Speaker 1: The the chat loves you. Thanks guys

Speaker 10: so much.

Speaker 2: Say hello to everyone at Radical. We're big fans. We'll talk to you soon.

← Back to story