OpenAI engineers demo ChatGPT Agent handling multi-step tasks like scheduling, planning, and presentations

Jul 17, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Yash Kumar & Isa Fulford

workflows, you might like fine watches. And up next, we have Open AI. Uh big launch today. We are gonna break it all down. Welcome to the stream. How are you? I think we got caught up on guests. We have two people. Well, it's happening. Great to have you guys on the show.

Um I will I will uh start by saying uh sorry about the sorry that Coldplay had to have a conference last night like the the you know it was hard to you know hard to predict. You don't launch anything on the internet. I know we we were just talking about this. Yeah. Well, we're here. I have no idea.

He has a show at Demand. Well, we're normally you don't have you don't have to plan your launch schedule based on coal play. But maybe it's something to keep in mind. Break down forward. Break down the launch. What is in what should we be focused on now? So I used to work on deep research. Y used to work on operator.

We both had our launches earlier this year and I think after our launches we realized that our products are very complimentary and so we've basically combined the best of both into this new product chatbt agent. So, Chat GPT agent has access to a virtual computer with a bunch of different tools installed.

So, it has text browser, visual browser, terminal, and it's able to do um a lot of different things that you would do on a computer. Um so, it's just like very flexible and um pretty powerful model. We trained it using end to- end reinforcement learning um like our past reasoning models.

Um and yeah, it can make slides, make spreadsheets. That's Yeah. What what initial use cases are you guys most excited about? Have you been using internally? what what kind of companies should be and and just individuals should be kind of taking advantage of it uh as of today.

Yeah, I think um so deep research as you were saying like we combined sort of deep research and operator to build this.

Deep research was really good at research like I think the best product out there in terms of or at least that that's what I thought I still think in terms of researching and then operator was there to take actions. Combining these two you open up a lot of possibilities.

You can do research, you can do actions, you can do research and then actions. And then on top of that, we added APIs. So like for example, if you have connectors which we launched, I think a couple of months ago, you can connect Gmail, Google Drive, linear and whatnot. All sorts of products.

And combined with that, it becomes an extremely powerful research and action tool.

So for example like personally speaking I've been using it a lot internally um just to even talk with the codebase like I'm solving a particular problem I'll connected with GitHub I'll ask it like hey can you sort of go figure out what's happening in this code which is let's say a new codebase for me and then the model is also very natively multi-turn which means that I can just have conversations back and forth with it which is not true necessarily wasn't true for example for deep research model um which we released earlier this year so It's really really useful from a specifically from at work when I I'm able to connect all these amazing tools and able to just understand what's happening all parts of the vision.

So secondly, oh sorry I was going to quickly I was going to say like personally I also use it a lot. I think there is lot of small things or big things that I have to do. I can do them myself. Give an example. My wife and I have a date night every Thursday.

I forget to book it most of the time and then I get in trouble and then now with agent like you can schedule tests etc.

You can just say like look every Thursday just go ahead and figure out give me five recommendations in the morning which are available and I can just do it show up on Thursday morning just click it and it's done.

So things like that talk I once deep research came out it felt like uh there was this a little bit of a meme about like we have 15minute AGI uh and I'm I'm trying to understand I could imagine this uh this new product stretching that out to let it run if it's building a spreadsheet and scraping data from different sources and putting a whole bunch of different things together.

I could imagine letting it run for like an hour and coming back. But it said you you you said it's multi-turn. So, what does the typical interaction look like? Is there a wider variance?

I noticed in the in the latest revision of the chat app, there's now a little like 15minute UI element next to deep research to kind of hint that hey, you're getting yourself into a 15minute cycle. Obviously, there's lots of efforts to spin that to speed all of those processes up and that'll come.

But what is uh what is the typical interaction time look like? Is this uh more asynchronous or synchronous or kind of you can do either? like how do you think about those trade-offs? Yeah, I think our team in particular is really focused on solving harder and harder tasks that take people more and more time.

Um, so a lot of the agent tasks can take anywhere from like around 5 minutes to I I've seen it take over an hour. Wow. Um, but a lot of times these are tasks that would have taken humans many many hours. Yeah.

So I think like as our agents get better probably the length of time they'll take to solve tasks will also get longer because um just the tasks will become so much more complex like imagine a task that takes a human like many days. Yeah.

So maybe so so a question I have is uh right now the agent can browse the web for me do research take action. Uh, I imagine the next step would be an agent uh like some like a a voice extension to it where as an example I might say, "Hey, I I use Geico right now. Uh, I want to potentially switch.

Can you go out and do research? Here's my here's the cards that I have. Try to find a cheaper price. " And then or even call Geico and negotiate a lower rate. or you know, you're traveling, let's say, and you're I need to cancel an old internet line on Spectrum right now, and they only allow me to call during weekdays.

Yeah. The other example, you're on a vacation and you want to change your flight, and you know, you're going to maybe have to like call and sit on a wait.

So like is is that the direction that you think we're going towards where it can not just browse the web but then actually start to uh it's it's funny to think about because the voice agent would then just call and maybe it's talking to another agent on the other side. Um but but yeah where where are we going?

Yeah, voice is definitely an interesting form factor. I think the way to think about where we're going is twofold. The first is what Issa just mentioned. I think we want to continue to solve harder and harder and longer and longer tasks. I think today we can solve let's say an hour or so of tasks.

Hopefully in future we can solve multi-day task and it might take longer, it might uh might take shorter depending on how we're doing but continuing to improve on the reliability and complexity of task that we can continue to solve. So that's sort of a core part of what we want to continue to improve on.

Second part, the Geico example for example, the one you gave technically you can do it today not with voice obviously you can just type it in in agent and it'll be able to essentially answer the query you can login uh with the virtual browser that we have on agent with your whatever internet provider you use and I think it'll be able to tell you things about what's happening there what's not happening what other alternatives might there be and all of those things but then at the same time you bring up a really good point which is like voice might be a very natural sort of way to do this in future and that's a form factor evolve other other companies introduce voice as an intentional point of friction like the sort of call to cancel would never do that to me.

Yeah, they would never do that to you. Um and so that that to me feels like this me you know if I can go into a web app and just click cancel or do something like that that my my question is about like the user experience of like having something that could take five minutes or an hour. How predictable is that?

I've gotten in a great pattern where I expect deep research to take 15 minutes and so I know when to go to deep research and I'm going to come back later and it's great but if there's some variability there is it going to send me a push notification when that's done is that how that works like h how do you train the user to get the best experience?

Yeah, it will send you a notification. I think actually the fact that deep research always takes the same length of time is probably um a I don't want to say a bug but I think of it as a feature not the end state.

I think of it as a feature like it should think for as long as it needs to think but I think for deep research it always just thinks for a really long time even if you ask what the weather is. Yeah that's true.

So I think there's a better middle state and I think that this model is like a step towards that but I think it still will think for too long on like really simple queries. Yeah, takes two minutes. Yeah, you're totally selling deep research short.

Sometimes I ask you what the weather is and I want the history of weather from start to finish and what it is tomorrow and yesterday and I want the history of meteorology and how the Doppler 3000 works. I want everything and I love that about deep.

The use case that I'm sure that uh will immediately uh start happening that is pretty hilarious to think about is a student that just says, "Hey, these are the three websites that host the homework that I have to do. Let's do that proactively. Go to the website, figure out the homework, create it, fill it out.

The teacher is going to be like, yeah, the teacher is going to be like, check this check this homework. If if somebody wants to say that's not that's not AGI. I don't know what to tell them. Yeah, I don't know. Um, what about other tool use? Jordy mentioned um uh Jordy mentioned um uh phone usage.

You mentioned spreadsheet integration.

what what what's kind of further down the stack of integrations that you've already announced that might be kind of underexplored or underappreciated at this point in time to to me I think that the tool that we've given the agent is very general and powerful like you can almost do anything that you need to do on a computer with this tool because it's browser and terminal which you can you can do most things it might not be the most readable to a human um so I think that now it's it's about pushing the capabilities.

Like you can ask it to do anything in theory. It's just the agent won't be good enough to do everything you ask it to do. So I think that we just need to make it better and better using the tool the tool it has. Yeah, I think the frontier continues. I think we as Isa said the tool is extremely general.

It can it has access to a browser. It obviously has access to terminal and we can give give it access to as many APIs as possible. Sure. That should be that should allow you to build whatever you want to do generally speaking.

Like for example, you can totally imagine in future there's access to a a voice API or whatnot whether it's internal or external depending on like how things go, right? Like you can have access to everything. You can build everything but we still need to push like it's still early like we've not solved everything.

It's still early. We still want to make sure that we can solve use cases with really really high reliability and that continues to be a pretty large focus. Yeah. Well, I'm excited. I'm excited.

I mean, think of think about a world where you can give it access to your password manager, things like that that it just immediately can or just API integrations, right? So then the passwords don't even need to pass back and forth. That makes a ton of sense.

Yeah, I'm excited for I feel like deep research maybe doesn't have access to images and chatbt yet, but I could imagine those being way like the reports being way richer if you can define them.

And then sometimes when I'm when I'm just generating like a general chart, I actually want to use like a a visualization library in Python and kind of going back and forth. So very cool to see it all kind of come together and very excited for where this is going. Um what's the rollout strategy?

Um when can people actually start using this stuff? People everyone on pro plan should be able to use it by end of day today. Let's go. and we'll and we'll get it to plus users over the coming days and then enterprise over the coming weeks. Very exciting. All right. Well, congratulations on the launch. Super exciting.

Uh we're we're going to turn this day around. It's it's now just about opening agents. Ignore ignore all Coldplay memes. Ignore. Uh well, thank you guys for joining. Thank you so much for having us. We'll talk to you soon. Cheers. Bye.

Up next, we have Dan Shipper, friend of the show over at Every who got early access agents and we're going to get some feedback from him. John just spilled his base juice. All over the table, all over the FT. That's That's really disappointing.