Zach Lloyd of Warp on model competition, cost sensitivity, and GPT-5's performance gains

Aug 7, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Zach Lloyd

positioning your brand for AI search? That's certainly an interesting question to dig into. Anyway, we have Zack Lloyd from Warp coming into the studio. Welcome to the stream for the second time. Welcome back,

Zack.

Good to see you.

He's to be back.

How you doing?

I'm doing pretty well.

Uh, you know, yeah. So, uh, I mean um, a lot of what stuck out to me, I'm mostly a consumer of consumer AI uh, apps. I'm very excited about not needing to mess around with a model picker anymore. Um but take us through the biggest improvements from uh the software development side.

Yeah, I mean it's uh it's a major step up from the prior open AI models. It's um I mean it's doing a gentic workflows and warp for much longer period. It's just a smarter general model like we eval it against all of our benchmarks and it's up there at state-of-the-art which is you know from our perspective it's it's awesome to have multiple competitive models that our users can benefit from. So definitely a a huge improvement from um GPT41.

Yeah. So it seems like not not the you know clog code killer um but certainly in in in the same conversation in the same uh in the same football stadium if we're using a sports metaphor.

How much uh you know one thing that stood out is the cost reduction. How about do you think that developers will care about that versus just you know what it can do from from an output standpoint?

I think developers do care about value. So sort of like quality to cost ratio. Um I think it's the more you get into like the individual developer and the small team, the more that that matters. Whereas if you're at the enterprise level, I feel like it's it's a little bit less uh price sensitive. Um so yeah, I mean you you you can see it as as different apps change their pricing what the reaction of the developers is. You've probably seen this with cursor and seen this with cloud code and so developers really really are looking for something that's cost effective. So the the fact that the cost is a little bit lower is actually is a big deal.

Do you think we're in um the Lyft Uber 2015 arc where the prices are subsidized and the prices will go up? Do you think that there's a price war on the horizon now that the Frontier models seem to be similar capabilities? Do you think that someone will try and raise a bunch of money, cut prices a bunch, and steal a bunch of users? Like, how do you think that plays out?

It's an awesome question. Um, I mean, my hope is that we get to a world where there is price competition at the model layer. So, Warp is very much at the at the app layer, right? And so, our value prop is like we can give our our users who are mostly developers the best model um access. And so to the extent that it's not one sort of model provider running away with that and having pricing power, it's better for us just candidly. And so, you know, my my hope would be something like the model world ends up a little bit like G-Cloud, AWS, Azure. That's our best end state where all of these models are, you know, sort of similarly powerful and a little bit more commoditized. I don't think it's been like that, but it's going it's it's getting a little bit more like that. And so, uh, the more that that there's more than one show in town, I think that's generally good for Warp and actually is good for developers because it will put competition, uh, the competition will put pressure to bring the prices down. But I don't know, like I also think that people will will definitely pay for quality. And so if there is a um you know a meaningful delta and quality on on the frontier models then I think that like whoever has the quality delta will will have a lead temporarily but it's I'm not sure that that lead will be sustainable. We'll see.

How how do you think the uh developer community should plan around uh model deprecation over the next you know one to two years? like h how much you know from I I I don't know that I've gotten a reaction yet from I don't know if there's general frustration yet from people um you know we've we've heard on the consumer side Tyler on our team here loves four five uh and so he was a little bit disappointed to hear that but but it what kind of what what are you seeing on the developer side? Yeah, I think it's a little bit different for people who are like building apps on LLM versus people who are using LMS as like a like um accelerator to doing coding. Um and like you know at Warp actually we do both like we we're we're an application level stack and like it's actually very easy for us to go to the latest model and so it doesn't it doesn't really um bother me. I don't know. I don't know what type of app you would be building where it's like it's really important that it's like GPT35 or GPT4 or something like that. I think like generally we want the most intelligent tokens at the best cost. So I don't I don't see that being like too big of an issue honestly.

What about open source? Um does that does that feel like something that will be in the playbook? Is the markup on closed source models high enough that there will be a significant price delta and or or is the para frontier kind of indifferent to closed source open source?

So if there was a a comparable open-source option that would be awesome. I think that the economics of it again it doesn't it doesn't seem like a perfect analogy to me between like open source software and open source models. So open source software it's like you have a big community of people who you know for the love of coding are building a really awesome product for open source models it's um it's like you just need a crazy amount of capital to train something that's on the frontier and so I don't know that how that happens and so what we've seen is like the open-source models are competitive at the quality level that they're at but the quality level that they're at is not the same as the frontier models and I don't really see why that would change. Um, and so I don't know in warp it's like we we were serving some open source models, but they're just not they're not as good. And so there's I think a more limited use case for them right now. Um, and I don't really see economically why that would change. In fact, I would be I would be surprised if anyone was spending billions of dollars to train a model and just kind of put out the open weights. like I don't get this the business strategy there but but maybe that will happen that would be awesome.

Is there a world where uh you're like this idea of like smarter smarter models either orchestrating dumber cheaper models or like using or or distilling models into more narrow um narrow formulations that can be run more efficiently. We've talked to a few companies that do this for businesses. Like you just want a model that just filters for profanity and you can run it on, you know, a a a gaming graphics card. And so it's basically super super cheap or super fast. Um I'm wondering about like in the coding world, coding agent world, any of that? like where where are the opportunities to kind of fan out and use an ensemble of models instead of just the hit everything with the smartest best. It feels like because of the funding environment, everyone can kind of justify like a high cloud bill, but um and most people don't admit that it's hurting the bottom line, but it feels like at some point it kind of has to eventually. Uh I mean I think I think that's a very real thing like in sense of um even in warp we don't use

like the the biggest most powerful model for every task and so

there's certain things like

um you know for warp maybe for like deciding whether or not we should summarize a conversation is like a good example. So you hit the context window, you're like, okay, is this is this a good spot to summarize? Is this a good spot to encourage a user to start a new conversation? We use a much more uh inexpensive and also low latency model. Right? The other thing, the trend is that these very very uh powerful models tend to have much higher latency. And so we do a mixture of models and that's totally a real thing. Um, but I I think for like the like the predominant use case as a developer is going to be I want to tell an agent to do something. I want it to be harder and harder. I want it to run for longer and longer. Um, and to do that it's like you kind of want in general the most intelligent model. And so,

yeah,

know until this until the models have a sort of S-curve like type shape, I think that um I think it's going to be more of a quality game than a cost game for most of these things. And

doesn't it feel like they have an S-curve shape right now?

Certainly does from a consumer perspective.

That's interesting. Um from from a coding perspective, I feel like we're still accelerating. Um, like the difference again between the last version of GPT and this version of uh of GPT is is probably bigger than the difference between like 41 and four and four and 3.5. Like it's a big deal and same thing with the anthropic models and I'm sure that we'll see something from Google where it's an acceleration. Um, and I think that there is like a um, maybe an underappreciation of how much left there is to solve here. Yeah. Because when when you even when you're doing like a real coding task as a pro, like despite all the demos you see on Twitter where it's like someone asks uh, you know, an agent to build an app, that's like a lower level of difficulty than doing what a prodeveloper does with one of these models. And the models still don't produce great code a lot of the time. Like there's a lot of kind of handholding that has to go into it. And I think I think that we are still seeing an acceleration in terms of the models actually becoming not just like okay competent engineers but like really really good engineers.

Yeah. Do you care about benchmarks?

We care a ton about benchmarks like we um

but your own internal benchmarks or or

we we do both. So you know plug for warp we're number one on terminal bench which is the public uh you know terminal benchmark and we're top five on SWEBench which is the coding benchmark. And then the only way uh in my opinion that an app at our layer in the stack can really improve is by measuring the progress. And so we have our own internal set of evals that we run across all these models as well which are coming from like real use cases and that again is an advantage of being like a product that's in the wild that has a lot of users is that we can sort of see where the models are failing where they're working and so we're we're very big on that actually. Yeah.

Awesome. Well thank you so much for stopping by. We will talk to you soon.

Sure you'll have a busy afternoon.

Shout out by the way to OpenAI team. Very very helpful in uh working with us to get GPT5 to be awesome in Warp. And one more shameless plug. It's we have a discount code for people who want to try GPT5 in Warp. It's $5 GPT5.

Okay.

Thank you for having me guys.

Yeah. We'll talk to you soon. Thanks. Cheers. Uh Tyler, any updates from the timeline while you're thinking about what the latest vibe check is in the war between OpenAI's linear

is a purpose-built tool for planning and building products. Meet the system for modern software development, streamline issues, projects, and product road maps. Go to linear.app to get started.

Choice for OpenAI. You have something

uh from Reggie James, front of the show. Half of my timeline says this is the closest we've been to AGI. The other half of my timeline says we officially just hit AI stagnation. I love tech.

Well, uh, we will be going deeper deciding whether or not this is