OpenRouter co-founder Alex Atallah explains how AI model routing works and which models power users actually choose
Jun 25, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Alex Atallah
maybe it exists, maybe it doesn't. Maybe you just want to go long the S&P 500, get on public. com. Investing for those who take it seriously. They got multi investing. Industryleading yield. They're trusted by millions.
And next up, we have Alex Atala from Open Router on the show talking about Foundation models, who's using what. How you doing, Alex? Great to meet you in person. You might not know this. We went through the same YC batch, YC Winter 16. Which company? Lucy. Lucy Nicotine. Yeah.
Similar to what you were building back then. It makes sense. But I remember seeing your pitch and I remember everyone kind of thinking like, h that'll never be a thing like like most YC things. Is that the same company or a different company? Uh that that was the previous company. Okay, cool. Open.
That was open seed, which became the dominant NFT marketplace and and and absolutely took over the world. How did I miss that? Yeah. Uh and so it's you can tell how much research we do on guest appearances. Uh it's great to have you on, Alex. It's great to be here. Thanks for thanks for hosting me. Yeah.
I I uh why don't you kick us off with uh I'm I'm interested in the introduction on open router but specifically the business model because uh it's a fascinating place to sit in the AI infrastructure stack and I can imagine a bunch of ways value will acrue but how how are you either currently or planning to make money?
Yeah. So we're you can kind of think of us as a control plane for language models like a stripe uh meets cloudflare for language models.
So like both of those companies, we orchestrate and route traffic to the best GPU host for you based on the type of model that you want, your price preferences, your performance needs, where you are in the world.
Um, and so it's kind of like a one-stop shop for all the models where you get like the most models, the best prices, best performance, highest uptime. A lot of these models are down a lot of the time, so load balancing is just needed.
intelligence goes through like intelligence brownouts like Harpov's been saying and uh and so you need you need some way of of keeping the utility like lit up um a lot of apps will just like rely on 100% available intelligence. Sure. Um so we you can pay you like users pay us directly.
Um we sometimes do volume with different providers so that we can get good volume economics on the supply side and uh and then um and we kind of make money mostly there. Uh talk to me about what Microsoft is doing in the category.
Uh Sachi Nadella at Build was talking about something around model routing and they also have this unique deal with OpenAI where it seems like they will be able to hold on potentially to that as an API on the cloud infrastructure side and we might not see OpenAI GPT4 CP 4. 5 vended through say AWS or uh GCP.
Um h how is that relationship? Do you sit on top of all of that? are you kind of indifferent to it or or how or how is it playing out there? Yeah. So, unlike most other resources and software, AI is pretty unique. It's it's a battleground with lines being drawn around the models.
Uh and and the models are just not available in all clouds. Like Gemini is only available in Google Cloud obviously. And then you have Enthropic which is available in AWS and Google Cloud but not Azure.
And then you have Gro which is available in Azure and directly from X but not Google cloud and like nobody can keep this these things in like on the top of their head and no one can track them. We track them and we sort of allow you to like orchestrate across all of the clouds. Sure.
Um so you can get all the models in one place. So it I do I see this continuing where basically the the war draws these lines around the models and uh and no one cloud will have them all. Yeah. What companies should I understand in order to understand this business?
I feel like if I was throwing something out, I would say Snowflake, but I don't know if that's at all relevant. So feel free to steer me in a different direction.
But I I'm interested in this idea of a of a company that sits across the hyperscalers and uh and and and does not necessarily just load balancing, but actually you mentioned Cloudflare, but it feels like Cloudflare has a lot of their own infrastructure, their own data centers.
Uh I'm interested in a company that has built a a large durable business on top of GCP, Azure, AWS, etc. Yeah, good question. I don't know of an exact analogy that fits that or a bit of a unique uh player, but Cloudflare I mean started as a as only a layer on top of all the clouds. Okay.
It was like a plugandplay safety net uh to prevent you from from like denial of service attacks and to turn on like security for um for like any web app. And uh open router started in a different way like as a way of just amplifying supply and choice for people.
Um but in a similar fashion we add like like more complexity to the compute layer that we provide before it hits the the clouds themselves.
Are there things that you're I I want to know about the different models, capabilities, what the different labs are working on and I'm wondering if there's a like what are the important design criteria or or what do you need to think through if you are doing this type of load balancing like I'm imagining if I'm if I if I'm serving an app that's built primarily on let's call it claude and then all of a sudden it fails and where there's a brown out and it rolls over to Grock and all of a sudden like I'm now in like the anti-woke world like that the the the flavor of the model has changed.
I could probably do some prompt engineering to kind of get them onto a similar standard or looking at if I'm using a million token context window uh with Gemini and then I'm falling over to something with 100,000 uh token context window, I might need need to do some chunking then.
And so what on the engineering side are you seeing people doing either with prompt engineering or just throwing a for loop around something to kind of actually make the different products identical from an enduser perspective. So we we very rarely see people fall back to a completely different model. Okay.
Usually the the load balancing that we do is for a a spe a specific model. Got it. We kind of orchestrate between like between two and sometimes 20 different providers for different vendors of those. That makes sense, right?
Um but the second half of the question that you asked is more is much more interesting to us because we there are even for one model there are like providers that provide all different kinds of context lengths, max output constraints, um features, some support tool calling, some don't.
Some support structured outputs with a JSON schema allowed. Some only allow you to output a JSON object, but no schema is allowed. You It's like a really crazy wild west that we're just trying to tame in one spot.
So, um, we have, I would say, the fundamentals live uh, with a lot more to come, which basically just allows you if you send a really big prompt, then we only send you to the providers that support big prompts.
If you want a ton of output, we only send you to a provider that can actually like fit that much output in your request. And if you want both of those things and JSON and some crazy features that nobody that nobody supports, yeah, we give you a heads up that it's not going to go anywhere. Got it.
So, uh, more to come to help with that, but that is like our bread and butter. That's what we're good for. Can you, uh, give me kind of your state of the union on how the different labs are positioned?
You know, some people think of Anthropic as the safety focused one and uh you know, it seems like Zuck is kind of in a rebuilding phase with the Llama project. Grock's been showing a lot of promise. Maybe not on the cutting edge, but this very interesting stalking horse for the other labs.
OpenAI seems to be running away with it on the consumer side, but B2B is a little bit more of a bidding war. So, how do you see the market right now? What are you recommending to different uh entrepreneurs that are building on these with services?
You guys have probably one of the most interesting data sets y in AI and you publish a lot of it yourself. Some some you know companies opt into to sharing their data.
So I'm interested to hear where you know even about kind of like disconnects between like attention that certain companies are getting and maybe valuations and then the realities of sort of usage right.
So, like one of the most popular uh companies on Open Router right now is is Klene, which is a coding agent, and I'd never actually heard of them. Um we haven't had them on the show yet. Uh and so that that was kind of a surprise, but uh but anyways, kind of interested to get your your take on all that at a high level.
Yeah. So uh Klein is for those who don't know it's similar to cursor but allows you to bring all open router models to it and uh and they they invest a lot in in making all models work really well.
Um the the our approach to helping people choose models is kind of a do-it-yourself DIY research approach for the most part. We have a router that you can use that will pick the model that uh that we think your prompt is best suited for.
Uh but right now what we see most people doing is going to our rankings page or go into that the homepage. Uh you when you mention that Klein is is one of the top apps. Those are all apps that have opted into being shown publicly. You can just click on one of those apps like Klein.
Click on right now and you can see Sonnet 37's the top model uh this month followed by Sonnet 4 uh followed by 2 Gemini 2.
5 flash and so you can just see what the what the what users are doing and in practice what this means is you're seeing what power users are doing because power users consume exponentially more tokens than everybody else.
So the the the principle here is to let people learn from power users because power users are just investing the most time and money into this. This is why tokens I think are a good way of measuring now because they're it's both a measure of time and money that users are investing into models.
Um so to answer your first question, we we generally we generally don't like directly recommend um but we we'll we'll like work with some customers and understand their ideas and like recommend models that way.
And it's usually very specific um based on like the kind of workload, the you know the amount they want to spend, the their tolerance for for performance and speed. Um, but what a lot of people do is they just check our rankings page. They look for apps that are similar.
Um, and then they pick the ones that are trending or popular. Makes a ton of sense. Uh, I would love to have you back on when there's another big model release. I'm sure you are a endless source. I mean, I imagine even like hedge funds would reach out to you for like, hey, is this model getting adopted quickly?
I'm sure a lot of the data is uh open source, so you can grab it. But uh, this has been fantastic. Thanks so much for stopping by. Uh we'll have to have you back to chat more about different models. Have a good one. We'll chat you later. Uh really quickly, let me tell you about adquick. com.
Out of home advertising made easy and measurable. Say goodbye to the headaches of out of home advertising. Only AdQuick combines technology. Out of home expertise and data to enable efficient, seamless ad buying across the globe. And breaking news, TVPN will have a few