OpenRouter raises $113M as AI inference becomes the biggest software market — and multi-model routing its backbone

May 26, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

with you guys. What else? What else you got TV to get the word out about?

Congratulations for coming to the show.

Super exciting to you, too.

Hopefully we can talk soon. We'll see you.

Goodbye.

Uh, Alex Itala has some news. We're seeing a Cambrian explosion of AI models and it's happening on Open Router. The future of AI is neurodeiversity agents choosing the most cost-effective model provider tool for the task.

He's back.

And he's back. Welcome to the show. How are you doing?

Doing well. Thanks for having me back.

You've been uh very busy.

Been busy. Give us the news.

What happened?

So, we actually raised this round back in February. Um but we uh we did from Capital G as the lead and our existing investors participating and a lot of strategics uh from from different uh corporate VCs participating too like Nvidia, Service Now, Data Bricks and a bunch more.

Um and we're growing the team. So you know we not only do we believe in neurodeiversity as a good thing that every company will need to leverage but also uh like neurodiversity within our team as well. So if you're smart and you are get things done and you are uh you think in a unique way we want you.

How much did you raise? We raised 113 million. Now, let's talk about ROI. You're obviously pushing tons of tokens. People need to measure the ROI on these tokens. How have you been processing like the next KPI? Like there's this back and forth with Uber. I'm sure you've seen this where they started token maxing. They're pushing tons of tokens through. Now they got to see what did we get done? How how are you actually talking to people about cost optimization, ROI? Like what are you hearing around the the value that people get? Where are the best use cases? What what are the places where you're telling people, hey, uh you might have gone too far on the token maxing that way.

Yeah, there's a really important trend that we're observing among all of our customers. Sometimes you just don't need Uber black, but you're you're all the time. just don't know what else to do. There was a time where everybody was in the pre market fit phase for leveraging AI for their needs and now many companies are past that phase.

Um and they're realizing that most of their opex is going to inference and that's pretty insane. That means that if you make cost cutting decisions and if you optimize your model usage that directly flows to your margin and now the business is directly more efficient. This is not you know like optimizing your data dog bill.

Sure.

So uh so I think the future is going to be multimodel for many reasons and that's that's a big one today and it's a big one this this this coming year. Um and there are a lot of companies who are also just doing one task that should be broken up into multiple tasks.

Yes. served by specific lowerc cost models and they're getting massive cost savings out of it in addition to improving their recall and accuracy.

Yeah. So it's really important to do and and then down the road you know another use case will be getting better than state-of-the-art performance by using multiple models

and that's something we're also helping our customers with

like mixture of models like ask the same question in multiple models synthesize the results

orchestrating multiple models that were trained by completely different companies

to do something for you and then using a judge or some other set of huristics to select the best result or combine them together. Mhm. Can you talk about uh opportunities or entrepreneurs or small operations that are bringing compute to bear on open router? I saw uh George Ho talking about he found some building that had a bunch of power. He was going to rack a bunch of Nvidia GPUs in there and sell the tokens on open router. Like what what what does a I think everyone's familiar with what the hyperscalers are doing. people are familiar with that NeoClouds are doing, but how diverse is the compute supply on open router these days? What is a small uh what does a small shop look like in the modern era?

Good question. Yeah, a lot of people think that basically everybody's just using one model and that's completely untrue.

Yeah.

Um we have about 350 models that are being used by hundreds of active users per day.

Yeah. And uh while and the token diversity is growing over time. If you go to our rankings page, we have a chart that just shows the graph getting more and more diverse.

We're doing about 12 trillion tokens per month now.

So the there there's diversity on model selection, but there's also a ton of diversity in providers for specific models. Um it used to be the case that like being a provider is Yeah. just really hard because how do you get distribution? Yeah.

Um, even if you decide to specialize in like very very very uh lowcost but but slow inference. So stuff that's like optimized for batch agented workloads.

Sure.

Um, how are people going to discover it? So we're building all these SKUs in the marketplace so that providers can find a market really really quickly. And so we become the the goto market strategy for the the longtail of providers. We've built an enormous number of tools to do quality checks and rigorously test these providers on the SKUs that they aim to support and then we we send them back all this reporting so they can really easily optimize their inference and get better and better over time. predictions uh for American open source. I think American open source will become a mixture of taking existing models. Some of them uh some of them might be foreign might be Chinese and improving them and then creating a like new models from a new foundation. And I I it's it's hard to make a prediction on it. It remains to be seen because we're just seeing so much adoption right now from foreign open- source models and domestic closed source models. I think the the the best thing the American open-source models have to leverage is uh some enterprise demand that really wants like only American open- source models. Um they're American companies and uh and they're they're aiming at it directly. So we provide easy ways of like exploring the you know like model providers on open router by like their family type. So you can like easily look at like which models follow a specific family and and only filter down to those. Um and we do want to help model labs that are like trying to aim for some kind of niche use case in the market or some kind of niche demand in the market find that market. It's really hard to discover them otherwise. Mhm. How are you thinking about uh personal AI routing? Obviously with companies, you have this very economic calculation around opex and there's an agentic workflow that's running for every customer. So millions of times worth squeezing every uh every cent out of the out of the equation. Uh but we are also seeing growth in openclaw Hermes like these personal agents. There will be some closed source providers there. But uh memory feels like an important piece of the personal agent puzzle. A lot of that can be offloaded to the context window to MD files. Is do you expect the shape or the cost constraints to be any different in sort of personal AI or consumer AI versus enterprise? Yeah, I think I mean the this is not going to be that big of a surprise, but local context is the number one difference that we've seen over the last year. Like if you build an agent that can really leverage the full computer, um it works very well for personal AI use cases and that's because people have a personal computer. Um what's interesting is that there isn't something that works on the phone very well even though a lot of context is stuck on your phone. So, we may see a cool uh personal agent develop in that direction. Um, what's also interesting is that we don't have something very social yet. Um, there isn't something that like pulls your social media data and like really leans into that part of of a like a person's life.

Yeah.

Um, so I'm curious if we'll see something there. And we've also seen like shockingly few games, I think. Um and and there may be like a really good opportunity for something that looks like a game but turns out to be much more.

How big can open router become? I mean, we want we really believe that the the like everybody in the future will want to use multiple models in the same way that like even if you could hire a 250 IQ, you know, chief of staff. Um, and you but you had the option of hiring five instead for the same if not lower cost. uh you would go for five because like you know the five people are just going to be like significantly more likely to flag issues that one person would have missed. Um and not only that but you know it helps you cost optimize significantly. You don't need Uber black all the time.

So uh inference I think will be the largest software market potentially the largest market in the economy. um all you know knowledge work will will need to leverage it otherwise you're just handicapping yourself dramatically and open router aims to be a very large chunk of that. Um you know we we do help the model labs like find customers as well and we work symbiotically with them and and same with our providers. So uh well a lot of open router is about building like things that enterprises need. You know when when you bundle a bunch of models together immediately you realize that there are all these boundaries between models that you have to secure and observe and manage the costs. Um and so open router is a really good way of managing and securing the boundaries between models and between server tools. Um, so we're we're building a bunch of like new agentic tools that you'll see in the coming months that uh that help do this both both for enterprises and for individual devs.

What are your scaling challenges? Are you CPU constrained?

We are memory constrained. I think first on on uh um you know like server memory.

Yeah. and and I don't think we're CPU constrained quite yet, but memory

uptime is but uptime is particularly important for you I imagine but there are probably other uh other considerations around uh you know multicloud availability interactions with different uh geol locations and being like you have a different set of of optimization parameters from other companies.

Yeah. So we that was the first problem we aimed to solve is like

given a large set of providers for a given model.

Um how do we really perfect the router to like send you to the model that will be up as as quickly as possible and send you the provider that can like best serve the parameters requested. So um we've been doing pretty well there but I mean we can do even better and it and it will get better soon. We do a lot of like internal benchmarking against uh against the rest of the market against going direct providers

and um and so we can kind of track progress and hill climb internally.

Uh I think I think what what becomes like a real constraint for us is like brand new models that don't have much capacity because only one provider is serving them. You know, there isn't a ton. Sometimes we we host the model ourselves or we work with a provider to do it. Um and sometimes we just tell other providers like all the market signals that we're seeing and be like guys you should host this model like look there you know there's in this

← Back to story