OpenAI ships GPT-5 Codex model as coding agent usage grows 10x in one month

Sep 16, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Alexander Embiricos

be right. I will kick it off with Alex from Open AI. We are bringing in Alex. How are you doing? Good to meet you. Hey, how's it going? Nice to meet you too. Uh fantastic. Thanks so much for joining. Uh would love to get uh brief introduction on yourself, your role within um OpenAI and then the update today. For sure.

So, hey, I'm Alex. I'm the product lead for Codeex. And uh yeah, actually been it's been a pretty fun morning. We've been spent the morning like looking for GPUs because we shipped a new model yesterday. Um and demand was a little bit too high than uh forecasted. Okay.

Uh and so we were running the model actually a lot slower than we were hoping and luckily we just fixed that. Um so the update um we just shipped a new model for Codex. Codex is our coding agent. Yep. Um it's an agent you can work with everywhere you code.

So like in your terminal, your IDE, GitHub, even your phone, which is actually a crowd favorite. That's great. Uh the goal for Codeex is to build it into an AI teammate. So you know, just like a human teammate, you start out working together with it.

Eventually, you start delegating tasks to it, and then eventually you just give it a laptop, some permissions, and a job, and you just say like, "Please prompt me with tasks that you think are worth doing. " Yeah. Um so go ahead. Uh yeah, I I have I have so many questions.

Um I mean first I'm I'm interested uh you running the model slower actually helps you if you're in a GPU crunch. Is that just because you're spreading the amount of uh inference across like a broader user base essentially? Yeah.

Just think like if you you know on your laptop like if you open like way too many programs they all just kind of run a little slower. Okay. Uh so that the solution you know you could close programs on your computer but we don't want to do that.

No of course because you have other other clients and other users and businesses and stuff. um uh where uh like t take me through some of the use cases.

I mean we were talking to uh Doug Laughlin over at semi analysis fabricated knowledge and he was saying that uh he'll use what is traditionally like a coding agent to actually go and do something that looks like a deep research report.

I had a sort of magical uh uh result where I used um a coding agent to do a deep research report but then instantiate it in HTML and then I was able just to open an HTML page locally and it was this amazing thing because it just has different set of UI that it can pull from that's not native to just the chatt app.

Uh, and so that's one world, but then I'm sure there's people that are using this in the business context, deploying it into like large code bases. Where are people having the most success these days? Yeah, totally. I mean, coding agents are really flexible.

Uh, the primary use cases for coding agents and, you know, in codeex as well are to write code, answer questions, uh, and review code as well. Um, and that's really what's been sticking. So, we've seen like well over 10x growth in usage just over the past month. Wow.

And most of that is improvements in how people use it to write code and answer questions. Um, but we're now So, just to be clear, you said 10x growth in the last month. Yeah. Over over that. It's Yeah, it's growing like crazy. And most of this is just because we've been improving a ton. Yeah.

Uh, so like, you know, if you go back rewind a little bit over a month ago, we had the codec, which is a coding agent in your terminal. We've landed a ton of improvements to that. Yep.

Um, you know, some people like buttons, so and they like using uh, you know, the coding agent next to the code that they're editing themselves. So, we shipped a VS Code extension. VS Code is a, you know, really popular IDE course. Um, and, you know, that's quickly becoming a super popular surface as well.

And, um, you know, one of the special things about Codex is that it doesn't only run locally on your computer like a teammate, it can also run on its own computer and we call that Codex cloud tasks. And so, we've landed a bunch of improvements there as well to make it like way faster, you know, and other things.

So that's that's a huge unlock for mobile probably, right? Yeah. Yeah. It's a huge I mean you basically can't really code on mobile unless you have that and so a lot of the work we've been doing is just like making those like fundamental surfaces where you use codecs better. Yep.

Um and you know that kind of started that growth curve uh which we're really excited about. And then yesterday um sort of building on the success of GP5 and listening to a lot of the feedback we've been seeing around how people use GPD5 specifically for coding in codecs.

uh we shipped a new model which is GPT5 codeex uh a version of Codex that's like basically like further optimized specifically for the codeex type use cases um and that's been really cool uh you know for example one of the the big areas of feedback people had was that um you know they sometimes GP5 would think a little longer than they were hoping specifically when they asked the coding question in codeex when they were just asking a quick question but in sometimes they wanted it to work longer and so one of the things we did uh is we made it so that with GP5 codeex we can actually more dynamic dynamically change how much time we're spending solving the user's question.

And so, for instance, if you ask a quick question, you just get an instant answer. Uh, but also we're seeing people, you know, using this thing just like letting it run for over 7 hours. Um, and that's an arbitrary number. I'm sure you can get it to go longer depending on the problem.

And yeah, so we're seeing a lot of feedback like that that people are are really excited about. Um the model is also you know because we had the opportunity to really go deep on on software engineering and train for those use cases.

Uh the model is also a little better at things like code quality uh front end uh steerability via agents. mmd um and some other things like that. Yeah, I saw a post earlier from Theo. He said GVD5 codeex is as far as I'm as far as I know the first time a lab has bragged about using fewer tokens.

Hope this becomes a trend. Do you think that's going to become a trend? I mean, I guess like tokens are not tokens, like really the thing is like fast. Speed is really important and I think we're going to brag a lot about speed. Um, it's kind of fun. We want to brag about both sides of it, right?

We want to brag about being really fast for the interactive use cases. Uh, and we also want to brag about being able to do a ton of work independently. Yeah. Um, so yeah, that uh that graph got mixed reactions on Twitter.

I don't know if you guys want to pull it up, but uh it's it's a little hard to read, but I think it's a really fun graph where you basically see the skew of like um well, I don't know how mirrored here, but basically on easy queries, we're using like 90% fewer tokens to answer your your query.

And then on hard tasks, complex tasks, we're like basically using like double. Are you using the term uh model router? Is is there something that you can share on the architecture there that that uh is actually driving that? Because that feels like a huge unlock.

It was of course magical the first time you went to chat GPT and got it to dump out, you know, pages and pages of coherent text and you get to the bottom and it's all coherent still and you're like if you're if you're coming from the GPT3 world, you're like, wow, this is a huge breakthrough.

But then quickly you realize like I don't necessarily always need 20 pages of response. I can have a smaller uh output. Uh and so are is there anything that you can share there?

Um yeah, I mean the team really cooked here and I expect us it's like part of the exciting stuff that we're doing with Codeex is like these like very bespoke interventions. Sure. Um but no, it doesn't have a router, but there's a little bit of secret sauce there that we we're not going to go too much into. Yeah.

Yeah, of course. Um what how are you thinking about uh like model naming conventions and model numbering conventions? Um that there was a time when when model number correlated to like big circle and then GBD5 was like a much more almost holistic set of improvements. You've brought those to codeex with codeex 5.

Um, but is there is there a world where codeex diverges? Are you always building on top of the same like release cadence now? Is there anything uh about how you think about messaging?

What actually changed um to the actual user to know, hey, it's time to dip back in if you tested this or you want to go deeper in a certain area. Totally. I mean, so our goals with with codeex are just to build an amazing coding agent.

And if you think about what an agent is, it's a combination of like uh the model and what we call the harness, which is kind of like the set of tools that the model has and then an interface for the user around that.

And so the reason we put Codex in the name is because basically we wanted to just like we're really investing in Codex and like in software engineering generally.

And so we wanted to have the opportunity to go like much deeper and push harder on things that like might not benefit like sort of general um users of a model. And in fact, we wanted even the ability to like make certain things worse.

So not that I've gone and checked, but like GB5 codeex if you're not using it for codecs is probably going to do worse at like a bunch of general tasks than GP5 will. Sure. Uh react to this Rune post. Right now is the time where the takeoff looks the most rapid to insiders. We don't program anymore.

We just yell at codeex agents says rune on X. Uh but may look slow to everyone else as the general chatbot medium saturates. Uh people have been saying oh this is the 90% of code being written by AI agents. Uh and there's this debate on are we just writing 10 times as much code? Are we doing 10 10 times less work?

Like give me a little bit more color on how you're using codecs internally to build the actual product. Yeah.

Um I mean we use codecs internally a ton or a lot u and you know we're I can think we're at the point now where you know for like writing code these coding agents are like massively adopted and uh you know it's kind of a stylistic choice like how much of the code it's writing but it's like the vast majority.

Um you know I will say that our goal isn't to like we don't actually have a goal of like yeah let's like make sure the maximum amount of code is written by agents. It's like really what matters is like your velocity as a team right? Yeah.

Um but uh sort of my like maybe slightly spicy take here is like actually one of the greatest limiting factors um on the utility of coding agents right now is not uh like the model's ability to write code but it's actually the human ability to like just prompt it and like make use of the agent.

So for example um we shipped a code review feature that's like you know doing really well a few weeks ago and uh you know the model is capable of doing code review. We actually did a ton of work to make it even better at code review.

So, um, when Codeex does a code review for you, it'll like actually get a whole copy of the codebase. It can like go read everything even outside of your PR. It can even execute code to validate, which like a human is probably going to be a little too lazy to do normally.

But the big unlock isn't isn't even necessarily that like even GPD5 is a great code reviewer. Uh, but the limiting factor is like are you going to go prompt the model to code review every single PR that you get? And the answer is no.

So what we did is we just like automated that built it in a nice way like integrated with GitHub and now we're seeing uh you know it's just reviewing the vast majority of PRs in the B open openi repos and catching like serious bugs.

It's actually kind of become a meme when uh you know if so we ask uh people internally like hey react the thumbs up if it's good just so we can like kind of see um and whenever and like reply with what's wrong with this review if it catches an issue that the engineer disagrees with and you know often it'll be the case that an engineer will kind of see this issue uh reply and say yeah I I disagree and then like two days later it'll be like actually uh this was a mistake and we need to we need to actually follow the model's catch.

What does it take to make it on the OpenAI codeex team these days? I mean, it feels like the skill set of just someone who's a programmer is changing. Now you're prompting, designing. Uh what is it? Give me some advice for young people or people entering the the workforce.

I think uh the thing I would say for open a coding team and by the way we are hiring uh both folks externally uh and then also you know see people can transfer internally. Yeah. Uh probably the two most important things is kind of a you can just do things attitude and speed.

Um and I think those are pretty general things that I would recommend. Um you know if you mentioned like advice for young folks like I think it's like a a sort of crazy time now to be early career because so much is just possible to do and possible to learn quickly.

Um but at the same time because of that everyone else is like just doing a lot of stuff and learning a lot of stuff quickly, right?

So I think to like really make the most of that the the key is just to just like really do stuff especially like if you're in a technical field start doing stuff beyond your coursework but actually start building things. Yeah. Increasing returns to like high agency.

It's funny we're in the era of agents but then also high agency people seem to be doing better than ever. It's kind of a paradox but uh uh it's a good time to be able to have a positive attitude and actually go and uh just uh just try and build things. It's a good time. Great time to be alive. True. Go ahead.

No, go for it. Yeah. No, I was just going to say I think this is true as well at work, right? Um like maybe before you would have a number of like very specialized role like PM, designer, front end engineer, backend engineer, like you know, S.

Uh and now u because of all these coding agents like everyone is able to just do a lot more and so you can have people like owning like entire problems end to end.

Um, you know, one one fun example actually leading up to to yesterday's launch is um our designer built this like really cool asy animation of a coin that's rotating. That's cool. Um, and you know, that's like kind of a hard thing to get right. Oh yeah.

And so um over the weekend he just like vibe coded an asky animation editor like he that lets him like edit the wild. Yeah. And like that's like such like stack compression. Totally. And it's awesome. Yeah.

I mean, there used to be a time when it was like you're a front engineer, you know JavaScript, but you don't know Ruby on Rails or Python in the back end and like the backend guy doesn't know or gal doesn't know uh JavaScript or whatever and they got to talk to each other and there's lost in translation stuff.

So, uh yeah, it's an entirely new era. Very exciting. Yeah. And it comes with problems too, right? Like if you have all these folks uh like the the ask animation viewer I mentioned, like that's throwaway code. So, I guess there's not much tech. It's just like pure acceleration.

We actually write a lot of throwaway code like different kinds of dashboards, editors, mini tools. Um but then you know sometimes you have folks who are like specialized to one thing now like able to write code for a different thing.

Uh and you know that actually creates a different kind of problem which is like code review and expertise.

And so that's why like what we're trying to do with Codex now is like we're investing in making like the you know fundamental like codegen experience great and just writing code but we're also starting to like reach out now.

So we're really investing in code review because that's like actually becoming the bottleneck like having qualified people be confident in code. We're really investing in validation. So like before you review the code like can you assert functional correction correctness.

One thing the codeex can do now is like take screenshots. It'll like actually write a playright scripture to like manipulate the website and then take the right screenshot of it so you can actually look before you read the code.

Um, and then we're we're thinking about, you know, where else can we make it really easy to to engage the agent uh before it actually like gets to you on your computer doing work. It's amazing. Congratulations. Thank you for the update. Thank you so much for hopping on the stream. We'll talk to you soon. Thanks, guys.

Good luck out there. Have a good rest of your day. CPU is on fire. Good luck. Well, uh, that concludes our guests for the day. Basically, we got to jump. We got to catch a flight. I got to listen to I wore, which is in my audible now, by Tim Higgins. You can go find it in Audible, find it on Amazon.

Uh we will see you tomorrow live from Meta Connect later show tomorrow. Later show tomorrow 4:30 where I think we're starting. Uh we'll be posting updates on X, keeping you updated and stay tuned. Thank you so much for tuning in. Have a wonderful Thanks to Bobby Cosmic for holding it down in the Twitch chat. The legend.

Thank you to John Xley in the YouTube chat. We'll see you guys. teams like un