Ben Hylak on GPT-5's one-shot reasoning, Nano's cost/performance sweet spot, and what's coming next

Aug 7, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Ben Hylak

snuck up

just in sheets. The popular convenience store chain with 750 locations is now offering 50% off purchases paid with Bitcoin and crypto daily from 3 to 7 p.m. What a wild move by Sheets. Well,

well Ben is in the waiting waiting room. Let's bring him in.

Let's bring in Ben Hilac. How you doing? Good to see you.

Doing well. How are you guys doing?

We're doing well. I'm just going to say hello. I got to take off and talk with Taipei. I'm going to let John take it from here. Absolutely. I'll close out the show. Let's have a fantastic conversation.

Give me the update. Uh, how's the day been for you? What were your expectations? Did this meet succeed? Uh, did it underwhelm you? How you doing?

Well, um, so I've actually had access for a couple weeks. So, we actually did a video. I'm not sure if you've seen it, but, uh, OpenAI brought a couple of, uh, folks from the Twitter sphere to their office a couple weeks ago to try and some other

Yep. Yep. Yep. Um, I think that uh it pretty much exactly meets my expectation as far as like how uh how it's been received. Um, and I've tweeted about this as well, but I think that

it's really really good at like oneshotting things. Um, you know, I think like it's better than I think other models we've seen, but I think it's actually sort of a distraction in a lot of ways. Um, I think that the things it's a lot better at are a a lot harder to describe and b I don't think the the harnesses for it really exist yet. Um, I think

harnesses. So,

uh, what the way I've been describing it is that I think I've seen, you know, web search existed in chatbt for a really long time, right? Like it was able to like call a tool, search the web.

Y

obviously like deep research was very different than that, right? Like what we saw was it was like actually like calling you know searching the web it was like reasoning about those results changing its kind of course like course correcting the middle. So like intermediate reasoning is like is the is the term for it.

Um and they really trained it how to search the web well. Um I think GPT5 does that for like a whole plethora of tools.

The interesting thing is that a lot of products like I think a lot of the agent products that exist today were kind of built wrong. uh like they weren't built that they didn't build their tools the right way. Um and we've seen this before like if you look at like you know the first um you know kind of infrastructure for agents was lang chain like way back when like two or three yeah it was it was you know it was uh it was early but it was wrong right and so like anybody that you know they've iterated since right they have like lang graph a better implementation but the first imp implementation of lang chain was like again early but wrong and so if you built your product on lang chain like you had to you know significantly change it

I think we will see a similar thing happen for GP5 five, you know, it's not just like, you know, uh, change the string in git, you know, from, you know, 40 to 5 or something and push and now you, you know. Yeah.

Yeah. You know, that meme about like, oh, like Sam Hullman stood on stage and like just like, you know, killed 75 startups, Google just killed 100 startups, Apple just killed Partle with their new thing or whatever. Uh, did any of that happen today? It feels like it feels like this is like the Lang chain needing to change their strategy. uh that happened a while ago. I haven't identified anything. It feels like, you know, Scott Wuh hopped on and said like, you know, great day to be an application layer company. The foundation models got better.

It's uh it's more tools in my tool chest. I'm extremely happy and uh and and I'm I'm more confident than ever. And I believe him. I believe that he was he doesn't see today as like fundamentally needing to change his business model.

I I think that's true actually. I think that um people have been you know there's a lot of people building agents right now. I think a lot of them have not been feasible for some of the reasons that GPT5 starts to address. So I think it is I think that what it means is that the entire architecture behind agents will get a lot simpler like it feels like a a good day for people building applications. Um

yeah it's not immediate that there's like some you know uh like company or something that got killed today. Yeah. Yeah. Yeah. I mean, in general, it feels like, you know, Dwar Cash up updated his timelines. There's just been a general idea that like we've we've maxed out pre-training, we've kind of maxed out post-training. We're now in the let's reap the reward of this. And we've seen it in uh like the incredible financial performance, the incredible usage numbers. Uh you know, millions and millions, hundreds of millions of people are using CHP 30 30 minutes a day. I'm I love the product and yet it feels it feel it feels like the what have you done for me lately meme. It's totally like okay. Yeah. We went from the iPhone 4 to the iPhone 5 today.

Yes.

Still really an important technology, great company, but like I want another iPhone 1.

Yes. Yeah. Yeah. Yeah. I No, I totally get what you're saying. Um, I think that like I I I wrote a piece about this with Swix, but

yeah,

I it really actually changed the way I see that path to AGI. Like I think before using it a lot, I kind of was like, "Okay, we need like bigger bigger models. They're going to like get smarter or something."

Um,

I think like I I had this realization. So, I was watching it like solve um I had this like really weird um like dependency conflict with yarn. Like we have like a mono repo. It's like part of the problem also with this discourse is like um the sort of problems it gets good at solving are just like not sexy things to talk about. They're not things that even you'll understand. And I'm like, we have this issue with our like the way we structured things and like

but like um a a couple weeks ago I was watching it like I had this problem. No other model would solve it.

And um I watched it sort of like poke around like it started running this like yarn y command in a bunch of different directories in between. It's like reasoning and like correctly reasoning about like why what and why and what it was learning and it you know taking little actions in between seeing what happened. M

um I think what I realized is that like um you know if you imagine like humans without tools like if we never had any tools we're never even able to write things down like would you be able to tell that we're intelligent uh would we have like you know learn to speak etc like I I just like don't

you know even if we could not have ever invented fire right it's like it's like where would we be right now there there feels like there's a similar like I actually think a lot of the next year is just going to be how do you get these models to do things better is like you know I think it's next year

uh in your yarn uh example um you said like you were you were having it I assume gpt 5 like work on the problem was that wrapped in a coding tool did you just go to chat.com and give it your GitHub repo like like talk to me like what was the actual user experience from your side

yeah so this was in cursor

um I think the codeex CLI the new version of the codeex CLI which they just released today is also really really really good.

Um I think that you will really only see a significant difference

in places where it can sort of like explore its environment is the way I would put it. Like when I was watching it like go bounce around my repo and like like I felt almost like I was watching something navigate like a little like video game like Pokemon or something like that. That's kind of what it felt like. Like it's kind of like I'm going to go over here. I'm going to see this. Okay, wait a minute. That conflicts with what I just saw over here. Like where should I go next? Do you know what I mean? Like it felt very um novel uh is like what I would say. Yeah.

Yeah. Yeah. Yeah. Um what uh so yeah, I mean how are you using it? What what um uh where do you see it going? Do you see it like uh just like a little bump of a tailwind today or or what's your read on like uh like how you'll be using GPT5 going forward?

I mean yeah there's two huge things. So like one thing that like really got missed today is that uh they also released GD5 nano which is like an incredibly good model actually. Um so like we're not talking about it but it's half the cost for input tokens than flash light or sorry yeah I think it's actually half the cost of input tokens than flash light and it's a really good model like it's like 40 level for a lot of like writing and stuff like that. Um and uh so yeah, we'll we'll be using that uh probably in the short term. I think it'll be interesting to see how other providers react. Like I'm sure Google will cut their prices as a result, but it is the cheapest like hosted uh model. I think that I I don't think anyone's serving at any other model for those prices for that matter.

Yeah,

that makes sense. Um what else are you looking for for the rest of the year? Uh probably no GPT6 on the horizon, but what are you looking out for? I mean, it seems like Google is expected to respond with Gemini 3 soon. What else are you tracking in the in the world of AI these days?

It's a great question. Um, I think that, yeah, that's going to be wildly interesting. I think what Google does will tell us a lot.

I think that they you've probably seen it, but you know, they released this like world model uh yesterday. We're kind of not talking about it anymore. I mean like if those videos I haven't tried it myself. If those videos are real like that's that's one of the most mind-blowing things I've seen in the last like you know decade or something. So like if that's real like that's extremely interesting and I think has all the stuff that's going on with role models right now has like huge implications for like everything like from robotics just like so many different fields. So

super super interested in that. And the other thing is I actually just think that like again I'm I'm actually really bullish on Chip T5. I think that the way it was received today is like just about how I expected it. Like and the reason is like when I say harness again I'm like I think that like canvas in chatbt is pretty bad is like my would be my take like you know it's a tough product to make but like uh yeah like does really poorly with like long files crashes sometimes like that sort of like I think that we don't have

the the product layer around GG5 doesn't exist yet. So I think we're going to see some really really interesting products um that are built around it. Yeah, it's always hard when you go from like a a binary qualitative in yourrface improvement GPT like CH GPT was like we passed the touring test and now the next test is like

super intelligence that self-replicates smarter than every single person knows everything. It's like the bar is like we really moved the the goalposts you know

100%. I think that there was like a lot of you know discourse around the model as well like leading up to it which I think didn't help you know but like the way that I would think about it is like I think that you know depending there's some percentage of the way through automating software engineering that we've made it like let's say it's like 70% or something 75%.

Um the tough part is like that last like 25% is um a the hardest it's like the least um sort of decipherable to like explain to people. It's the least like um universal like like if I'm just like oh make a you know one of the examples I did I I made a personal website it's like all Mac OS 9 themed in like 20 minutes with GT5. Um, and so it's really fun, right? You get it. Like my mom gets it. Like I can show it. I I can share it. You get it. You know, my mom I can't explain any of the like the very specific ways that 5 like helps in our specific codebase, our specific problem, whatever. Um, so I think that like it'll be less and these launches will probably get less and less uh sort of interesting from a so like from a what it does for software engineering as that gap gets closed. like I you know what's the last 5% of software engineering like I you know like I it's probably not going to be that interesting to me.

Um

do you think they'll be on an annual release cadence now? Like Apple

updated all of their iOS all their operating system nomenclature to be like we are now on 26 because it's the year it's like a car model like like

I don't think you can plan it. I don't think you can plan ahead. Like that's the interesting thing is like I think that you know there there's people that say that GT4.5 was supposed to be GT5. Yep. Um, and like I think that it sort of came out and they're like, "Eh, it's like, you know, it I I actually love 4.5. I think it's a really fun model, but um,

well, it's clear that like improvements come in many places just like with the with the iPhone, like the latest iPhone, you buy that because it doesn't it's not just like the one with the new screen, it has a slightly better camera, slightly lighter, longer battery life. It's like an ensemble of improvements that then they add up. And I think that that feels like what we're getting here today and what we will get in the future is like this little like we did a little extra RL over here. This tool is now sharper. It has new capabilities. We added multimodal like you know the video generation got better and this feature got better etc etc. And I think that like what a model is is still going to change a lot and like how we value like so just give an example like 40 was sort of this big thing you know where they talked about it being like natively multimodal you know taking in even like video at some point video in video out like audio in audio out and like you know you haven't heard that from GT5 yet like you can't talk to it on advanced voice mode like it doesn't it doesn't generate images like you know what I mean there's no at least yet native image generation how it works under these model capabilities like seems quite possible like the best model for writing natural language might not or like writing um creative you know creatively might not be the same model that writes you know really good rust code like these might be different models um so I don't know we'll see

yeah create image here is now tucked next to deep research agent etc. But I would hope that you can call that from the actual chat interface.

You can call it from the GT5 chat. It's just using it's using um GPT image one I think is actually the name of the model. So it's it's a dedicated image generation model which I think is maybe 40. I I don't totally know.

Yeah. I I just I I I don't particularly care. I'm not looking for one model to rule them all. I'm fine if with models calling different tools. It seems fine. Um

yes.

Anyway, uh fun day. Thanks for hopping on. We'll talk to you soon.

Of course. Anytime. Talk.

Have a good one. Bye. And that's our show today, folks. Leave us five stars on Apple Podcast and Spotify. And thank you for tuning in to the GPT5 Giga stream. We're on hour four and a half. Uh we've enjoyed hanging out with you, Tyler. Anything else from the timeline? Close it out for me. Timeline's still in turmoil.

Show the little game I made.

Okay. Yeah, let's show Tyler's game. Can we do that? Is that

You got it. The Tyler Sour Defense. Okay.

This was This was one shot. Okay. I didn't

Wait, what do you mean one shot? One prompt. You said you were working on it.

I was, but then it's like, wasn't

the definition? Oh, so you went back to a single prompt. Got you. This

I made a change, but then I've realized like, okay, this is not as good. So, I just went back to the first one.

Okay. So, yeah, my my my question is I mean, this this seems actually like it's it's like the game engine. I don't know what it's using under the hood, what it's do you know? Did it write like WebGL code or did it write like

I think it's just it's just like JS.

Okay. And it's just like HTML canvas.

That's pretty crazy.

Yeah.

Um you'd think it would use some like 2D engine off the shelf or something, but um my my question is like what that won't go viral cuz that is less impressive than just the Tower Defense app that I can get in the app store.

For sure. Yeah,

but it's like maybe if I take my, you know how those like control net images went viral where people would take their corporate logo and then they'd throw that through control net and it would be like the TBPN logo overlaid over like a forest and like the trees would look like the logo.

Yeah. So maybe like it's tower defense but it's my logo or something like that and like the the the enemies are like w like moving through something like that. I don't know. There's just got to be a way to personalize it and make it so every single game is a unique snowflake that you want to go and experience that one. You want to look at it, you want to spend some time in it. I don't know.

Yeah,

it's hard cuz it's like

it's still, you know, predicting the next token. It's not like image the 40 image generation was like kind of a it wasn't novel, I guess, cuz there was image generation, but it was like such a massive improvement. This is

Yeah.

Like

there's not any clear massive step change here. It's a little bit better in a lot of ways.

Yeah. So,

oh well. Well, we'll have to play with it more. Let us know what you think about GPT5 and we will see you tomorrow. Have a great day. Thank you so much. Bye.