Yutori launches web agent product letting users autonomously execute tasks on the internet

Apr 1, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Abhishek Das

do we have the right person, guys?

I believe so.

Are you sure? Sorry. We are figuring out who Oh, yes. Okay, got it. Sorry. I'm I'm uh

You thought it was uh

Yeah, I thought we were jumping ahead to Gary Tan. I'm sorry. Uh anyway, without further ado,

let's run it back.

Let's run it back.

Abashack,

I I messed up. Abashak, great to meet you. Thank you so much for taking the time. I got a little mixed up on the schedule, but it's great to have you here with us today. Anyway, please introduce yourself and the company.

Hey, uh thanks for having me. I'm uh Abishek Abishek Das. I'm the co-founder and co-CEO of

uh and introduce the company. what are you building?

Yeah, so at Utori web agents, so agents that can take actions and complete tasks on the the web. Um, so we think that the future of interacting with the web is going to look quite different than it does today where we're not manually navigating web pages, clicking buttons, filling forms, etc. We're going to be operating at a slightly higher level of abstraction where we have AI agents

who we delegate tasks to that carry on a lot of these tasks in the background on our on our behalf. And does that mean like virtual machines loading the full web page and clicking on it like a mouse and keyboard interaction or are you interacting with the APIs and reverse engineering the the the the routes to sort of just build CLIs on the fly? Like what are the different strategies that are working and not working right now?

Yeah. So it's uh it's everything you mentioned basically.

Okay. Both.

We we should probably think of it in layers. Yeah.

So there's the core LLM that's looking at how web pages are laid out. How to click buttons, navigate websites to take a task to completion.

Then there's the agent harness which surrounds that LM. So wherever on the web, we have APIs available. It make use of APIs where it where there's no APIs, it will use this kind of a visual linguistic model.

Yeah.

And the agent harness takes care of things like persistence. So if it makes a mistake and it has to backtrack and try something else where it knows how to do do that the agent harness would take care of memory orchestration breaking a task down into subtasks all of that. Um and then the third layer is putting all of this together. What does the product experience look like where um this works for everyone

makes sense. How are you thinking about the divide between uh consumer versus businessto business enterprise uh who how do you see the customer mix evolving? Yeah, most of our users are proumers. Um, so individuals who are using it for work. Um, so like small to medium business owners or individuals working at at bigger companies. Yeah.

Uh, the model layer that I talked about, we do train our own model inhouse. It is u as accurate as OPUS 4.6 and and JPD 5.4 uh but is but is 2 to 3x faster and much cheaper. So that we make available as an API as well.

Sure. What was the process? What was the process creating the model? Did you take an open source model and and

tune it?

Yeah,

exactly. Exactly. Um, so most of our focus is on mid-training and and post- training with a mix of SFT and auto. Um, we started off of an open source quenbased model.

Um, and we collect our own data both in simulation and actual websites and we train on those.

Yeah. When you're talking about so

the big labs don't scare you. uh talk talk about talk about how you see the the competitive kind of environment evolving. This is something that uh I think everyone should assume that all the different kind of consumer proumer and enterprise products

uh do maybe not super well right now but will do. How how do you see the market evolving?

Yeah, I think the market for non-coding um knowledge work, digital work um is massive. Um there's there's a lot of work to be done there. It hasn't quite hit the kind of inflection point that coding agents have. So yeah, there's there's a huge opportunity there. Um the area that we do peak on and we care about is tasks that happen on the web. So browser use capabilities specifically. Um Anthropic, OpenAI, etc. have have models for computer use which is more general purpose. But for browser use, currently ours is the best model that's out there both for accuracy and latency. How do you think about uh like the the textbook use cases? I'm sure you get asked about like booking a flight. That's more of like Agentica on the web go do something for me. I can also imagine there's a huge value in just uh you know monitoring websites, scraping data, putting things together. Uh we were talking yesterday about how uh you know will we see an explosion of token consumption uh among financial professionals like we've seen among uh programmers and uh my my my my bull case that we will is that yes you're just maybe building one financial model. You're not necessarily building a thousand financial models a day, but that one financial model might interact with thousands and thousands of web pages to collect every possible data point to create aggregated data sets that then can be compressed and compressed down into, you know, a 12 tab Excel sheet that eventually results in, you know, should you buy the stock or not or whatever, whatever the financial analysis is.

Yeah. I mean, so there's a there's a lot to unpack there. Yeah. Um the first thing you mentioned were like uh logging into a website and booking a flight or like ordering food or something. Uh we actually shipped a bunch of a big upgrade today and it's basically possible today like you can connect your favorite websites and apps to our product which is called scouts.

Um and you can just give it a task. Um like a bunch of us internally have been using it to automate like our fills orders, Insta orders. We've had people externally try it out as well for LinkedIn, other websites as well. But a lot of that is is possible today and it makes use of the like our core model the agent harness every like all the tech components coming together.

Mh.

Now in terms of um how I see the token consumption and usage of this going forward um I think it's still quite early in this space like a lot of non-coding digital work hasn't quite gotten the kind of attention um that and hit that inflection point yet. Um, so the cheaper and more reliable and more accurate it gets, I think we're just going to see an explosion of usage of of this technology. Yeah, it feels like there's a little bit of a capability overhang like you can do really deep research and you can do deeper research with a coding agent in many ways, but that workflow and that just it it just hasn't broken through to everyday consumers that they should even think about, you know, asking an LLM or an agent a question like, you know, build my financial profile from every data source all over the place and analyze everything. they come to it with like you know how much should I invest in the stock market like like a basic question that's basically just web web search.

Yeah. So there's there's two or three um aspects here. One is that uh we actually had someone try this recently on our on our product where they gave gave it access to their email and they had a bunch of expense reports that had come in on their email and they asked the agent um to prepare sort of a nice um categorization and spreadsheet of all their expenses and they were able to oneshot it.

Wow.

Um so that's one part of it. The other part of it is that um the previous version of our product um was primarily meant for agents that can monitor anything on the web. Um so kind of like Google alerts but on steroids like an AI native version of it. Uh but with today's release, one of the things we released is a the capability to um build live artifacts. So now these agents don't only don't just monitor, they can prepare a single sort of spreadsheet or a website or a dashboard. um that stays updated as new information comes in. So if like if you want to track uh anytime a startup comes out of stealth or like a startup announces a fundraiser, you can now use scouts to make a single maintain a single spreadsheet for it.

Yeah.

uh how where's where's the company at? When did you start it? And what were you doing before this?

Yeah. Um so we started the company in 2024. Um I'm I'm an AI researcher by background. Um I grew up in India, moved to the US in 2016 for my PhD that was at Georgia Tech. After that I spent some time at pair at Meta as an AI researcher. Uh both there's two other co-founders. All three of us are AI researchers by by background. Um we're about 15 people. Um we've uh raised we're a little beyond our seed. That's where we are.

Are you compute constrained at all?

Massively.

Massively. Well, I mean what's the plan? uh is is is the best practice just like hunt around for cheap GPUs or slide around different services like

or or use use your product to monitor when when availability comes online on on the Neols.

Yeah.

Neocots. Yeah.

Yeah. So we we definitely do a lot of that

and we are I think quite compute efficient uh from that from that point of view. But compute is one part of it. The other is data right like

for a lot of tasks on the web it's not easy to collect and generate data as let's say computer use task where you can just hire a bunch of annotators um to like let's say simulate certain tasks on their like Microsoft office apps and so on right like on the web many of these are irreversible actions like if you actually buy something yeah then there's a real cost associated with it so it's not as easy to collect data

um we do a mix of like simulation and sort of using our product um to visit web pages especially websites where there's a few clicks and navigation steps in board so you can't like index or crawl them um in a in a naive manner um

so are you building environments for particular websites is that programmatic or are they like handcrafted like how many like what's the scale I imagine that you like you need to build a lot of these for generalization that's very cumbersome and there's going to be like flaws with every single different system that you try and build.

Yeah. So we um wipe code a bunch of like simulated websites and use that to generate data and like eval um there is a good amount of in category um generalization that that we see like for example if you imagine how Amazon is laid out or how any e-commerce website is laid out there's not that much variance between them. If you compare that to for example how Zillow or Red Pin is laid out there's a huge difference in what the UI looks like.

Yeah. So just being intelligent about like how we're sourcing data.

And then are you like stack ranking those? Like is Amazon more valuable than Zillow because consumers will demand that or proumers will demand that over Zillow or vice versa. And how do you evaluate that?

I think it's a it's a mix. We do um keep a close eye on making sure that mix um looks good and like as we want it to be. A lot of our users um use it for more work related things. um as opposed to in that in that personal life. So we we keep a focus on that. Yeah.

Um yeah.

So it's log into my ERP, my payroll system, pull stuff from all different dashboards. I can imagine us pulling analytics from all the different analytics providers and they all have separate websites, but they all like have some similarities in the design philosophy and the best practices and where what the what color the CTAs are and whatnot. Anyway, uh that yeah, this is fascinating. Congrats on the progress and thank you so much for taking the time to chat with us. Thanks for breaking it down.

We'll talk to you soon.

Cheers.

Have a good one. Goodbye. Um, let me tell you about a very related company, Labelbox. RL Environments, voice robotics, evals, and expert human data. Labelbox is the data factory behind the world's leading AI teams. And let me also tell you about graphite code review for the age of AI. Graphite helps teams on GitHub ship higher quality software faster. And we should have Gary Tan joining us in just a minute, but we can go back to the timeline. pull up this post from Marik Hazan. He says, "We just rebuilt every startup in YC's latest demo day batch.

Here's what our agentic founders pulled off and what it means for the future of startups.

Fully usable products at the bottom of the thread below."

He's like, "I one-shotted all of your companies.

I built everything." What a crazy.

Let's play this video.

Yeah, let's play this video.

Rebuilt every startup in YC's latest demo day app. At Felt Sense, we build aic founders that source ideas, build product, and build

ideas from the YC demo day. So, we page,

could our killer agents compete with these cracked founders? On demo day, our agents swarmed the YC website, found every startup in the batch, and locked in to reverse engineer each build. They reconstructed their own PRDs based on public product specs, and started to rebuild each application. When they hit snags or needed input, they called in a human to help resolve every problem. Core technology that took founders months or even years to create was rebuilt, rebranded, and ready to go to market within 24 hours. So, what does this mean for YC and the startup ecosystem more broadly? It means AI replication risk is becoming more and more of a threat to every business. So, what makes a startup more or less replicable? This is what we learned. Several obvious protections exist for startups. Building things in the physical world and owning data that others simply can't get to are a few examples. But there's a third one that caught us off guard and it's more interesting. Most people think the best protection in an AI world is human creativity, meaning or ingenuity. The positive parts of humanity. But that's not what we're really seeing. The real protection seems to come from the messy stuff. Industries filled with politics, lack of trust, turf wars, and bureaucracy. Markets that are painful to work in due to complex or even failing social dynamics are exactly the ones that are hardest to deploy into and therefore the most protected. A company that has learned to pick apart that mess and embed their solution has something few competitors can copy overnight. Difficult markets aren't bad markets. In an AI era, they may be the safest ones to spend your rare time in

← Back to story