Hey Clicky founder Farza Majeed built a voice-controlled AI desktop agent in 8 weeks — now using Claude 4 by default
Jun 10, 2026 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Farza Majeed
Speaker 2: Anyway.
Speaker 1: I think it's time.
Speaker 2: Let me tell you about Shopify. Shopify is the commerce platform that lets you grow with your business. It grows with your business and lets you sell in seconds online, in store, on mobile, social, on marketplaces, and now with AI agents. And speaking of AI agents, we have a founder
Speaker 5: of the company. I don't know if we should call it
Speaker 2: an AI agent. What are we calling it, Farza? Is this are you creating a new category? Introduce yourself. Tell us what you're building.
Speaker 7: What's going on, guys? Farza here. What's happening on? No. Of course. Farza here working on Hey Clicky. Yeah. Man, what is it? It started as an AI teacher
Speaker 2: Yeah.
Speaker 7: And I guess now it's an AI that where you can essentially talk to your computer and it does whatever you wanted to do from like
Speaker 2: It's personal super intelligence and you got there before the big man. I love it.
Speaker 7: Somehow. I feel like the big man doesn't even know what they got sometimes, you know? It takes the little guy to figure it out.
Speaker 2: I love it. I love it. I'm always rooting for the little So, like like reintroduce the product because you went viral last week but it feels like you've been working on this for a while. What's the nature of the team, the structure, the funding? Like what's actually building this? What's adoption been like? Like where is the product today?
Speaker 7: Yeah. I started eight weeks ago. I just wanted to build a an AI that could Yeah. Teach could teach me DaVinci basically while I'm like actually using DaVinci. And I thought it was
Speaker 2: And this is DaVinci Resolve, the color and video editing suite that's free from Yes. Blackmagic Design? Got it.
Speaker 7: Exactly. Yes. So like I DaVinci Resolve complicated program
Speaker 2: Yep.
Speaker 7: One hour YouTube videos weren't in it, let me just build an AI that can help teach it to me while I could talk to my screen and I could learn. I've been there. And put it out and I thought it was really a really bad idea. Mhmm. One of my friends just had to post it. So I posted it and it goes really big Mhmm. About eight weeks ago. Mhmm. After that, I didn't keep working on it. But then the cool thing was I saw all these people, you know you know how when you put something new out there, have all this emergent behavior kinda start happening? Yeah. So when you let a human being talk to their screen, what do they start doing? Turns out they start doing a lot of crazy things all the way from watching anime with clicky, all the way down to like learning like blender with clicky.
Speaker 6: That's so
Speaker 7: easy to just talk to your screen. And that's kind of where it started. Yeah. About eight weeks ago.
Speaker 2: Okay.
Speaker 7: So it started it was yeah. Go ahead. Go ahead.
Speaker 2: I I I just want to hear about like the un hobbling path because I imagine that there's demand for, you know, like these chat interfaces have existed for years where you open an app and you you know, do the voice mode and then you have it read it back to you. And there's some that are more polished than others. But when I think about using a computer, think rearranging Windows, I want anime over here and DaVinci Resolve over here and the YouTube video over here. And I want to be able to puppeteer the mouse and the keyboard have full context so I need to be taking screenshots every minute, every second, 30 times a second, 144 hertz. Like what are we doing there? How are you getting the data in? And then what's actually under the hood because you need to process the voice actually produce an output. And it's surprising to see a harness like break out of this so quickly.
Speaker 7: Yeah. Yeah. It's actually so simple. We use g p t real time upfront Mhmm. To give you like like the first layer to give you like the really quick answer.
Speaker 2: Okay.
Speaker 7: But then if you want like a deeper kind of like a thought process over the image, you know, if you're in DaVinci Resolve, in some complicated software, we now use Fable five actually Really? By default. Yeah. So now Fable five is like absolutely mind blowing in terms of how precise it is on screen understanding.
Speaker 2: Under the
Speaker 7: So we use that now by default when you wanna like understand something on your screen. But yeah, only actually screenshot when you press a button. Okay.
Speaker 2: I don't
Speaker 7: wanna press the button.
Speaker 10: You won't
Speaker 2: even screenshot.
Speaker 7: But it's Yeah. But it's really crazy because we can still detect the program you're on and stuff and that's already really helpful. Sure. Like if I if I know you're on Notion, for example Yeah. And I you're there for like ten minutes, I can just ask you and be like, hey, what are you doing? Like, can I help you? And I think it's that sort of new modality that's like been really exciting because that's something that's just not possible today with any chat interface.
Speaker 1: Yeah. It's more like a like a co worker walking up, like like a co worker just like being like present and like kind of like like sometimes you don't need help until you're asked. Right? Or you don't describe it. Yeah.
Speaker 7: Yeah. I kinda describe it as like having this 23 year old intern with like a decent like a like a new grad that's always watching over my shoulder and and just pretty much like seeing patterns in what I'm doing and then it's and then just tapping me and saying, can I do that for you? If I had that, of course, that would be awesome. But it'd be cool if everybody had that.
Speaker 2: A lot of these frontier models are expensive. How are you thinking about the business model? Because it it seems like you've built something useful. People should just pay 10% on top of whatever the metered rate is. But at the same time, in consumer, prosumer subscriptions, flat pricing is very popular. But that introduces financial risk to you. You have to understand your costs and how they change. What do you think the business model will be as you grow the business?
Speaker 7: Yeah. I mean, believe it or not, like people are very open to paying large amounts of money like normal consumers, like large amounts of money
Speaker 2: Yeah.
Speaker 7: For for like more for having better access to these models
Speaker 6: Sure.
Speaker 7: Through different interfaces. So right now, we just charge a straight up $20 a month. Okay. But even then, there's a limit. You know, you get like for $20 a month, get a 150 agents on Clicky.
Speaker 11: Sure.
Speaker 7: But even that's very specific because once you pass that number, that's when we start losing money. Yeah. So everything's done in such a way where it's pretty cost effective. Also, like, just so you guys know, like, just calling, like, Sonnet or Opus or Fable, it's kinda cheap overall. Mhmm. The expensive part is, like, agentic work. That's really expensive. Yep.
Speaker 2: Like, just
Speaker 7: to give you an idea, on GBD 5.5, if you tell Codex right now to like go and click add to cart in Amazon, that costs 25¢ on by just API. Yeah. And I know because I'm seeing the cost myself. Yeah. And so there's a lot of tricks to kind of reduce that. But no, overall, there's so many tricks to making agents cheaper, more efficient, changing the model based on their quest
Speaker 1: catch it. Are did you haven't you have raised for this or you haven't?
Speaker 7: In the process.
Speaker 1: In the process. There we go.
Speaker 3: There we go.
Speaker 7: If you're out there.
Speaker 1: Yeah. But but if you're in the process, that means like you didn't launch this and be like, oh, I'm down to be losing like $5,000 a day Sure. You know, on you know, like, you're you've been focused on the economics probably earlier than you would have been if you were
Speaker 2: That's really
Speaker 1: had raised money out the gates.
Speaker 7: I mean, yeah. I mean, I'm I'm never the type of guy that wants to be losing thousands of dollars a day. So we can stop it early. But no, mean, that's the thing about AI. Like, know, if you're I built a lot of social apps in my life and for those costs technically don't matter based on the number of users coming in. But in my case, for every, like, 10 new people I add, I'm kinda losing or losing money or making money depending on who what they're doing. So Yeah. I kinda gotta gotta think about it. So, yeah, it's kind of AI kinda accidentally making everybody better at business, because it's kind of required.
Speaker 2: So is, I I'm I'm thinking about the hilarious, hilarious, like like, trade off in having an agent that applies a coupon code but winds up spending more money applying the coupon code than you save. Because it's like, oh, yeah. We put in this. It saved you 24¢, but the API call is 25¢. But how how do you how how important to you will, like, creating an ensemble of models that sort of maximize the Pareto frontier be as you build this business? Because I'm imagining that any day now we're going to get somewhat useful on device models on Mac from the new Siri models, the on device models. Those will probably be very limited in what they can do but useful for some things and that's free for you. Then you'll have open source legacy models, GPT-four class stuff and then you might only call out to a Fable five on demand. So do you need to build your own router or is there a tool that you'll be using? Like how important will that be from the interactions that you're seeing?
Speaker 7: That's a great question. You're pretty much asking, like, what's the harness? And we just built our own. Okay. Because technically, like, this is all so early where there's nothing to use off the shelf. Yeah. You have to think deeply about it yourself then build it yourself. So for example, let's just say, you know, you're on Clicky right now and you're saying, hey, Clicky. I want you to actually research the latest in the Iran war
Speaker 12: and put
Speaker 7: it in my in a Notion doc for me when you're done.
Speaker 2: Sure.
Speaker 7: What now? Like, should it hit Opus? Should it hit Codex? Should it hit it's actually depend. And so, like, we can take all those requests today and just route them ourselves. So, yeah, we we use, like, we use, like, four models right now underneath the hood.
Speaker 2: And and do you have a model that's doing the routing?
Speaker 7: So GPT real time actually does the routing.
Speaker 2: Does the routing? I think that How interesting.
Speaker 7: Yeah. That's something that we that's something that we figured out then. I think not even the OpenAI team really knew and they hit us up about, which is That's a great router. Yeah. It's so good at, like because it's so good at tool calling.
Speaker 3: Yeah.
Speaker 7: That means it's actually really good at writing requests. So for example, we have this tool call called Call Fable five. Mhmm. So if the user asks something that, like, is a heavier pixel task, we call Fable five. And if it's, like, more of, like, agentic work, we call we call GBD 5.5. And so, it's kinda all being done in the background. But that's the kind of magic of the product right now where it's like, most people who use our product have never used an agent before. They don't even know what codec is.
Speaker 2: Yeah. And
Speaker 7: so, to them it's magical where it's like, oh, I'm just talking to my computer and it's doing the thing. That's that's awesome. That's they want.
Speaker 1: How do you how do you think about the tension between, I think, you being a generational media talent Mhmm. And then, like, startups? Because I actually feel like this is our first time meeting, but I've seen your videos over many years at this point and like, you're just really really really really good at it. So much so that as like somebody who's in media, I'm like, well, I kinda want you to just do that. Podcaster. Normally normally it's the exact opposite where it's like, hey, like, you shouldn't do this podcast. You should just work on your company, you know, or whatever whatever the thing is. Yeah. And and they obviously feed into each other really well. Mhmm. Like you have Yeah. An edge. You can get attention for free, you know, just investing a few hours, making a video, whatever. It's very, very powerful skill set. But how do you think about that tension?
Speaker 7: You know, it's funny like I've been making videos for like fifteen plus years and I Overnight success.
Speaker 2: Second time. Yeah.
Speaker 10: I always just
Speaker 7: see this as a as like a I think I get to do for fun on the side while I get to do the main thing which is like actually build stuff for people. At the end of the day, like, I am painfully familiar with how bad of a business like making videos is, and making movies is, and making music is. Like, I know how bad of a business is like firsthand. And so I know, but I know more than anybody the power of getting in front of a billion people every single month. Mhmm. When there's a good engine underneath, a good business engine engine underneath it, that you that that's that's powering something else. And so, I fully intend to do that. So if I have this media ability, I'm glad I got it. I think Yeah.
Speaker 1: But it's like a
Speaker 2: way to do
Speaker 1: it's like sales.
Speaker 2: Yeah. Sales.
Speaker 1: It's like Really sales. Sales.
Speaker 2: You gotta be good at sales. WWC, some movement on interoperability in iOS still feels like the clicky iOS app is probably on the longer term and will be stuck in the walled garden for a while. But what I'm interested in is that do you see just a really, really solid market? Because there's no DaVinci Resolve iOS app that I'm aware of and certainly people don't there are there are like prosumer sort of professional tools that really only exist on a desktop or a laptop. And so are you interested in like going and hacking on crazy workarounds to build something that like maybe makes people a little upset in Cupertino but gets you into that more casual mobile space? Or do you think that you just want to focus on desktop, prosumer, you know, desktop apps?
Speaker 7: No. I want it all. Want when you talk to your computer, whether this is your computer Yeah. Or your computer is this computer.
Speaker 2: Let's go.
Speaker 7: I wanna be an interaction layer on top of it. Yeah. That being said though, I have no I have no interest in necessarily being like OS level controller. For example, like, kinda like what Apple is doing right now. There's no real interest. For us, most of our customers and most of our users are essentially connecting, like, 15 integrations to Clicky from g Suite to Notrek and to Dropbox Sure. And doing work with it. So I'm a lot more interested in, that reality where talk to what if I didn't talk to my phone in the morning and just say, hey, like, look through all my emails and send me send me a brief, you know, when you're done. Yeah. And just I can just talk to my phone and do that. I'm not sure that stuff, which is stuff I don't think Apple is gonna do personally. They're not gonna make, connect your connect these fifteen fifteen places to my Apple account and do work with it. I just don't
Speaker 13: think they're gonna do
Speaker 2: How do you think the health of like open ecosystems in desktop software is broadly? Because there's two ways that you could like update an Excel file or there's actually probably a bunch. But you can you can, you know, puppeteer the mouse, move over, click, type the cell, save, you know, or you can just go edit the underlying CSV file in like raw text and then just refresh the front end. And I imagine one's way cheaper for you so you want to like lean into getting the MCP servers, the APIs, the file writing like the CLI interaction down. But do you think that there will be companies that lean into that or companies that fight that because they want you to stay, you know, mouse and keyboard at all times?
Speaker 7: That's a good question. I don't think so because at the at the end of the day, what Clicky is doing in the background, just so you guys know how it works, we literally took the Rust binary that's in Codex and we package it with our with our app.
Speaker 2: Okay.
Speaker 7: So that when you actually call a click with the agent, you're just calling a sub spawn of codex. Yeah. And so this is on purpose. Like, I just wanna use the best the best model and the best kind of thing possible. Yeah. And so no. I don't think that's gonna end up happening. In fact, it's better. For example, if you ask me clicky right now
Speaker 2: Yeah.
Speaker 7: Hey hey, like, how do I actually, like, add a formula to this Google Sheet here?
Speaker 2: Sure.
Speaker 7: There's two answers there. One answer is let me show you. The second answer is I can show you, but do want me to do that for you?
Speaker 2: Sure.
Speaker 7: And I think that's where we're gonna start going with computers. Yeah. Where you're just gonna start your computer is just gonna say, can I just do this for you? Like, I see you doing it. Like, I can just do this. So that's what we're kinda gonna look.
Speaker 2: Well, good luck with the fundraise. I'm sure you'll be back on the show soon. Let us know when it closes.
Speaker 1: Yeah. It's exciting for this. Yeah. I love watching you you win and and have fun in the process.