Dan Shipper of Every on using ChatGPT Agent to autonomously analyze feedback for his email AI app
Jul 17, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Dan Shipper
That's That's really disappointing. You're not going to be able to read that on the way home. Lots of papers that I can I can Do we have Dan in the waiting room? Let's bring him in. [Music] There he is. How's it going, guys? Great to see you.
Every time you're on, you're in a different finally caught me when I'm not traveling. Yeah. Yeah. Yeah. Yeah. This is the first time. I like that. I like that light uh uh uh light up logo in the background. Very nice. Very nice. Subtle subtle kind of inhome out of home advertisement.
That's That's what we're going for exactly. Um what's happening? What's on your mind uh today? There's a lot going on. There's a lot going on. We're here to do a vibe check of chatbt agent. Um, so I was lucky enough to get to uh hang out with it and work with it for the last couple days before it got launched.
And I have a bunch of things to tell you about how it works. Incredible. Um, so as your previous guests who are amazing uh told you, it's sort of like deep research and operator had a baby. Um, and it does some really cool things.
So, uh, the first one of the first things I had to do is I had to go through all of our support emails and all of our feedback forum posts for the last like two months.
So, it's like about 1500 support emails um and maybe like 500 posts on our forum to gather for Kora, which is our email management um AI app to gather all of the customer archetypes of like, okay, who's posting, who's a promoter, and then going and looking on their LinkedIn to be like, what's their job?
uh you know, where do they go to school, all that kind of stuff, and put together a like long research report of who our promoters are, what the archetypes are, and who are who our detractors are, and why they don't like us.
So, that's the kind of task that like obviously like I could have done or someone on the team could have done, but so long. So long. Yeah, it takes a long time. Um, and it's the kind of thing that you almost want on like a recurring schedule.
Like you just kind of want to see like once a month, but no one wants to do that once a month. And you can schedule with Chad GBT agent. You can schedule it to run. So I can just say like every month I want you to just send me on the first of the month, which is it's really freaking cool. That That's wild.
That's amazing.
I wonder how compute intensive that's going to be because if I know anything about building dashboards and building like these these reports, there's always like an intense amount of like, oh, we got to have this dashboard and then you check the analytics and it's like, oh, it turns out the team just said that for a week and then like stopped watching it and if it's just running, you're like burning.
Dan is every Foundation Lab's worst nightmare because he gets on the most expensive plan and then uses it 100 times more than anyone else. He's you're single-handedly going to bankrupt a lab. There's other there's other users that are probably higher margin.
Um but but yeah, talk to me about that that that actual experience. Did you have to ooth with any different services? Did you have to share any API keys? Did you have to export any data or was it really as simple as just a prompt? It's basically a prompt. Um what happens is you type in you type in a prompt.
You say I want you to check out Kora. I want you to check out our emails. I got I want you to check out our our support forum. So, Chetchup has connectors. So, I had previously already connected my Gmail. So, you just like log in on on the ooth.
Um, and then what it will do is it it spins up its own computer on the cloud, its own virtual machine. Uh, it goes in the browser and starts browsing uh browsing the web. It also then connects to connects to Gmail. If it hits a login, so for example, when it hit LinkedIn, it it like couldn't log in.
And you can take over the browser in the virtual machine and type your password in, which is like a little a little janky. Yeah, it works, but it works pretty well.
I think the interesting thing about this though is it um there are there seems like there's two main approaches to agents and OpenAI and Anthropic are taking very different paths and they have very different trade-offs.
So the really cool thing about agent is they're essentially abstracting away the browser and the computer. So all you're doing is you're interacting with the GBT and on the back end all this other stuff is happening.
So it doesn't matter if on if you're on your phone, if you're on a crappy computer, whatever, they have this whole virtual environment set up. It spins up, it does the task, and it spins down. So it's like it's a very good consumer experience.
Um, cloud code for example from anthropic, which is I think cloud code cloud code is way more for developers. Ch agent is way more I think for consumers. Cloud code is all on your computer.
It's all in the terminal and and you have access it has access to all of your files and you have the ability to um use it wherever and whenever and however you want. So it's much more customizable and much more composable.
Um so I find that cloud code is much more powerful but it's much more intimidating and it's just not something that a consumer can use. And I think that they're trying Yeah. The crazy thing there is, doesn't Claude Code have more downloads right now than the actual Claude mobile app?
Like the the reg like I saw it's something crazy like that. I I honestly think people are sleeping on cloud code. Like I use it all the time for non-programming tasks and I think most people think they can't use it because it's in the terminal and the terminal is really intimidating but it's it's an incredible product.
So yeah, how would you solve this problem if you were to do the same eval of like generate your net detractors, net promoters using cloud code? You'd just open up the terminal on your laptop.
You wouldn't be able to do it on your phone, but you'd you'd just engineer a prompt that told it to do that and it would just write all the code that it needed to do exactly the same thing. Do you think it could hit that? Do you think it could do it? Yeah, it could it could do that for sure.
Um and I think uh the the the nice thing about cloud code is you get um you get many bytes of the apple and you can like uh so for example with cloud code what you can do is you can have it make a full plan so it can output like a full markdown document with like a you know 300 or 500 or thousand word plan you can modify it and go back and forth with it and then have it execute it.
I think it would be more complicated like yeah it would probably write some code to hit the Gmail API and I' have to like think about that as opposed to just like clicking the connectors button or it does have a it does have a web research tool so it would be able to go to like our feedback forum and do all that stuff and it would be able to save all the data so I could kind of watch what it was doing as it was doing it.
Um but uh so so I think you would get basically the same experience. Uh I think cloud code is a little bit more controllable and therefore a little bit more powerful but um chat is just much easier to use. That makes sense. What use cases do you expect chatt agents to uh have the most PMF around?
I was I was imagining the student use case which is just like monitor the homework that I have due across you know I remember in high school even teachers would host their homework on websites.
You could basically run something that was like monitor the homework assignments that I receive and then take a preliminary pass at doing the assignment and then give me a draft that I can review and and sign off on or tweak and then write college applications attend college essay a job for me deposit the money that you make as an engineer at multiple companies into my bank account and then also plan a trip to Europe because I'm retiring.
Uh, watch out Cluey. Uh, CHBT agents coming for you. Um, guy. Boom. Breaking news. Breaking news. Um, no, but it but it but like giving this powerful of a tool to everybody immediately. Not everybody's going to realize it, adopt it right away, but you can imagine like a few use cases just spreading like wildfire.
Totally. Yeah. Yeah. I mean, I think what are the what are the things that you would immediately do if you had an assistant? Mhm. Like if anyone if someone just dropped an assistant into anyone's lap, like what was the first thing that they would do? Um I don't know.
Uh help me book a vacation, help me like figure out how to order groceries, help me like another one, another thing I use it for is like uh research the web about all the topics that I care about and every day give me a report on all the things that happened in the last 24 hours.
And it just does that in incredibly well. I can go into, you know, go behind login walls and pay walls and all that kind of stuff. So, I think those kinds of use cases are going to be the going to be the the most interesting ones.
But I honestly think right now for most of my consumer use cases, 40 or really 03 is the best. It's much faster. I mostly don't need it to use a full computer to spin it up. So, I see ch agent as being something that you use every once in a while rather than something that you're using every day. Sure. Yeah.
So, so we're we're we're increasing the level of like complexity like like 40 is kind of a Google search replacement for me now. I just kind of hit it with like when was this person born? How old's this, you know, what's the state of this? What's the capital of this state or something?
Uh expect a really quick answer, then go 03 if I'm willing to wait a couple minutes, want something that's a little bit more thoughtful, maybe some search results from the web.
Then deep research if I'm actually trying to understand the full story, read a whole report agent if I think it's going to need to uh use a computer actually take some actions, pull some things together. Uh how was the actual interaction of the like the back and forth?
This was something we talked about with the OpenAI folks was like if it gets stuck, it it pings you. I like that deep research. Yeah, it takes 15 minutes, but I've trained myself to just be like, forget about that until tomorrow.
And then when I have time to sit down and read the full deep research report, which is going to take me a couple minutes, like then I'll come back to it. I know it'll cook and it'll be done.
It would be kind of annoying if deep research came back after two minutes and said, "Hey, I'm going to pause all that while I ask you for an update. " Like, it feels like there's a little bit more. I got to be on answering questions there, but push notifications maybe solve that.
walk me through like how in how involved you were, how active of a process it is. I mostly was not involved. Every once in a while like it does have a like a stop. You can tell to stop and like change what it's doing, which is nice cuz like if it goes off the rails, that's helpful.
But I think and and it it has a push notification thing, but I think this is an interesting problem with agents where I can't stop watching them. And so I spend a lot of my day just like watching the agent doing something.
And Ch agent has its own like cool UI where you can kind of like see interesting animations of what it's what it's researching or which websites it's using and stuff like that. So I find myself um actually glued to it to it a little bit and um I just don't think that's a very good way to spend time.
Uh, I think it's I think it's mostly solved by having push notifications, but I like there's a sort of emotional process of training yourself to be like it'll let me know when it's done and I don't have to like watch over like we we design a lot of assets here at TBPN and that like even if I trust the person creating it to do a great job, there's still this tendency to want to like hover and be like, "Okay, tweak this, tweak that.
Oh, let's do it this way in real time versus like waiting. " But um it's the same thing with 40. Like you know early on I would kind of be in this loop of like okay I got a result I still got to go fact check this and check the underlying links because hallucinations are a big problem.
Well now that they beefed up search so much and they're referencing direct quotes. Like I feel like I'm much less like the anxiety level around hallucinations is a lot lower just in general queries. Totally. I think these things are tools.
Anytime you're delegating to something, whether it's a human or an AI, there's a there's a learning process you have to go through. Like um human managers go through this with with employees all the time.
Like if you're a new manager, you have to like decide, okay, am I going to delegate this uh or am I going to micromanage? If I delegate, like I get more leverage, but it might not come back the way I want it to.
And good managers know how to split up a task or communicate it to their employees or figure out who's good at doing what and know when to get a when to get into the details and when not to. And I think we're going through the same curve with um models. So we're becoming model managers.
Um and everyone is learning how to how to solve the same problems that human managers have solved. And so the more experience you have with the tool, the more you know like okay, I don't have to check this answer or this looks a little fishy. Same thing with Chetchup T agent.
I think we'll be much better at using it in three or six months than we are today. Cool. Makes sense. Dan, always great to chat. I I do want to have you back on very soon to talk about LLM induced psychosis. I think it's important to talk about and let's talk about it. Uh but we'll need a lot more time.
Thank you for the vibe check and uh everybody listening, go subscribe to every right now. Do it. Awesome. We'll talk soon. Talk soon. Bye. Cheers. Up next, we have Chris Best from