Cua: Computer use AI agents wrapped in a full OS sandbox — infrastructure play betting 20% of AI agent scenarios will rely on browser/computer tools
Jun 11, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Franchesco & Aleandro
And we are ready for our next guest coming into the Palace of Party Rounds. Welcome to the stream. Um the the the hum of YC Demo Day has died down as people move across the street to lunch. Good to meet you. How you doing? Nice to meet you. I'm Josh. Nice to meet you. Pleasure. Nice to meet you. How you doing?
What's happening? What's happening? Big day. Keeping up. Yeah, it's a big day for us. Yeah. Uh, how' the pitch go? Smooth. Uh, we're gonna be in the in the afternoon today. Okay. How are the nerves? How are the nerves? Uh, honestly, we are used. Yeah. Alumni Day.
They do a good job of kind of like getting you so many reps that it just feels like anything else. Yeah, honestly, they work you up to it very easily. Uh, introduce the company. What are you guys building? So, I'm Franchesco, the founder of Kua. Aleandro is the CEO. So, we are building computer use AI agents. Okay.
meaning AI agents that can solve any problems like a human would do in the terms of clicking, typing, scrolling. Okay. Uh what uh at what layer of the AI stack are you working at?
Are you sitting on top of just like a chromium in instance or are you actually sitting on top of something like a browser base or do you can keep Yeah, we we wrap an entire operating system on a um isolated environment kind of like a docker for us agents. Okay.
And that means that we can uh use system level events for injecting these commands like click type and really any bash or powershell commands. Yeah. And then and then what's working in computer reuse right now. Uh there was theory that you just read the HTML at some point.
Now there's more multimodal image generation like actually take a screenshot process that understand where to click. What's working? Yeah.
So screenshot is working uh way better than accessibility tree for uh operating system in general like you don't have any HTML dom to parts really system we just like use screenshot uh screenshot and pixel based based model for that and it also been proven by research that's working way better than interacting what categories of agents are you guys seeing the most you know having the most excitement around traction I think anytime you're building agentic uh infrastructure.
You got to have companies that are building great products on on top of you. That can that can be a challenge. But I'm sure there's a lot of other companies in the batch that Yeah. Yeah.
So we haven't chased any verticals meaning that you haven't chased any vertical any verticals meaning that we figure out what people want. Yeah. Exactly.
Because like actually we first month in we were getting the most esoteric ask from the users in terms of hey I have this bunch of contractors there are simply three videos on video editing software on Mac OS. Can I use KUA for that? Sure.
And uh like really we couldn't converge to like very common workflows and that's why uh companies our customers are chasing those verticals for us. Okay. Uh do you see the market fragmenting? Do you think you will find a vertical and niche down or do you want it all? Uh honestly we want it all. Okay.
And we're here to then then what is the uh uh what are the uh the key deliverables that you have to uh optimize against? Is it speed, reliability, price, some sort of combination? How are you thinking about that? Like probably for computer use agents, uh we are still like six months um away from the chbt moment.
Maybe for browser use agent that still the moment is today.
So um once we get that level of intelligence for models we need to come prepared with a very good infrastructure to scale this u isolated environment because like our leap of faith is that five years from now most of the um AI agent a multi- aent system we rely on API maybe 80% of the scenarios and the other 20% are going to be based on browser and computer tools.
What were you guys doing before this? I was working at Microsoft for over five years. Oh cool. Awesome. Uh myself I was a notion and I built also a few startups in the in the past. So nice. Are you guys Italian? We are Italian. Nice. When did you come to the US? Uh 3 months ago now. Three months ago for YC. Crazy.
How's it been living up to expectations? Yeah, definitely. Uh people ask me um Ferrari or Lamborghini? Yeah. People ask me how is San Francisco now? Is it better? And I don't have any comparison. How I was How how old were you when you just when you knew you wanted to do YC?
Was it uh I think it was pretty recently like 3 years ago now. Okay. Yeah. For for me maybe since I was 16 and I'm 26 now. So it's been like a long dream for me and no since still like I don't know if it's reality or not but uh living the dream. Living the dream. Uh what's the go to market been like?
So the go to market right now we've been focusing mostly on start up and scale up because we wanted to prove that that we were on the right path and eventually fail also faster. Yeah. Uh, but we have an open source framework over 8K stars. 8K stars. Oh, you hit that. You buried the lady on us. Buried.
Please go and start. Yeah. Yeah. Yeah. Head over to GitHub right now. Give it another star. Yeah. Try. That's amazing. Yeah. Uh, so yeah. I mean, well, what's the monetization strategy around that open source project?
So we um like first month in YC we were all these customer uh inbound that were basically asking us how do I even productionize on a computer agent today. So we are providing a pathway for the user from the open source to bring in the same workflow that are working locally for them and scale them on on cloud.
So we're really charging only based on compute today. Um you simply have to input an API key on our platform.
But what's going to happen also for us we're going to become LM inference provider for these computer UI models because maybe the the public sense is that there are only true computer using models maybe the one from OpenAI and anthropic Y and that's only because they have a better PR office sure than other models uh but honestly like even on a phase you'll find model from Bance 1.
5 that's also out beating openi and anthropic on on computer benchmarks on computer benchmarks like Sure.
Um and uh they are so hard to set up and also hard to discover and also we're going to be the go-to catalog and platform where yeah how good are the eval right now are the benchmarks for computer use it feels more abstract than just you know do some math problems. Yeah.
So I was actually working with the Windows team when I was at Microsoft doing about for agents on a benchmark called Windows arrina which is uh derived from OS word. really like this benchmark.
Uh they like task are not very meaningful like for you will find task like hey uh can you go and open VC and add subtitles who's using VC anymore? So that's the question even like using Libra Office instead of like office or even Google Docs. Yeah. So what's going to happen?
U my my leap of faith here is that the next u generational benchmark will measure like real world task. Yeah. Do you think there'll be a sort of like LM arena style benchmark where uh a human is watching two computer use models use computers and there's kind of like a vibe check almost? Yeah.
Or even like a wiki race where you have like a wiki race. Yeah. Yeah. No. Okay. So wiki race is is uh like uh you know you start with Y Combinator and you have to end at Christopher Columbus and how many clicks do you have to click to get from one to the other? So you might say, "Why YC has San Francisco?
San Francisco is America. America has Columbus. " And and so you you try and race through and and yeah, it's an intelligence test, but it's also Yeah. Great computer use test. That's hilarious. Yeah.
But also, I mean, I I imagine that there could be like a like a big model smell like a vibe check on the computer use because if it looks very jerky and it looks confused, like that's something that might not even come across in a quantitative benchmark, but a qualitative uh human might might evaluate it differently.
Yeah. Interesting. Yeah. Yeah. And also like there's this whole problem. Okay. I have this workflow that now is working maybe 80% of the time. How do I make sure that I'm able to reproduce the same kind of uh workflow all over again? So, we're also working on episodic memory that will let you basically use RPA 1.
0, in all the old fashioned RPA and UI automation for um workflows that are very deterministic and then say that you have a deviation from from one trajectory due to noise or changing on a web page then that's when you use like full computer use when it really makes sense that makes a ton of sense any other Italian YC founders we haven't met many are you guys hometown heroes back in the the Italian taxi you got to go do the local press in do the press now you're does this in Bulgarian.
He's like a celebrity over there and over here he's there's a wise ski WhatsApp group now they're raging like this um week on the mountains that that's reason enough alone to apply