Sail Research raises to bring maximum inference efficiency to open-source models, starting with GLM-5.2

Jun 29, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

up more. Have a great rest of your day.

Yeah, it's great meeting you guys. Thanks for having me. Great to meet you, Jack.

We'll talk to you soon.

Cheers.

Let me tell you about the New York Stock Exchange. Want to change the world? Raise capital at the New York Stock Exchange. Our next guest is Neil uh from Sale Research. He's the co-founder. Let's bring in Neil Moa. Mova,

how do I say your last name? I don't want to get it wrong.

Moving.

Hey guys, great to be here.

Thank you so much for taking the time. Uh

congratulations on the round. But first, please introduce yourself and the company.

Yeah. Hey guys, I'm Neil, co-founder and CEO of Sale Research. Uh we are a company building the most efficient inference in the world. We love GPUs. We dig deep into the stack to find efficiency everywhere and we make tokens super abundant.

All open source. Do you work with other labs? Are you uh how deep do you go into the relative organizations?

Yeah. Yeah. So today it's all open source models. You can imagine GLM 5.2 is a big moment for us. We're very excited about that.

U in terms of how deep we go well in the stack, you know, we basically do everything between the chips. We don't make chips. We buy chips.

Uh and we go all the way up from there to the API.

Tell us about GLM 5.2. too. Uh what makes it uh what makes it like different in a binary sense? Is it is it a particular benchmark? Is it a vibe? Is it an application? Have we unlocked a new co capability in open source AI?

Yeah, it seems like uh ZAI really figured out post training with this release. That was something that was held back uh with the previous releases from Deep Seek and Kimmy, let's say, and they've just really done it. The style of the model is excellent for coding. It's the first one I actually with the straight face would recommend my colleagues try for coding.

For coding specifically, uh what about

before you would put on clown makeup and then you say, "Ah, yeah, give it give it a spin."

Uh what about for other agentic workloads? I mean, we're looking at open router, a lot of the top models, uh DeepSeek V4 light. It seems like it's a lot of heavy token generation, lots of lots of value being created, but smaller tasks. Um what what is that like from your business perspective? Are you still focused on optimizing those types of workloads?

Yeah, for sure. You know, Deepseek has always been the economics king. Uh we want to bring that to every model. Of course, we could talk about that a bit more, but um yeah, I think you're going to find that like some of these more background tasks that are not coding per se. Uh those will always go to the strongest intelligence per dollar and take a pretty broad view of what that intelligence could look like. And I think DeepC uh is still quite up there. Deepc flash is quite high up there.

Yeah. How how do you think do you have any intuitive sense for the ratio of token spend or tokens or anything on background tasks versus a human prompted an agent? Because we hear about token maxing and it feels like it's a lot of a developer went and fired off something and it cooked for a day and it's spun up a bunch of tokens. But when I think of the really high volume token future, I think of maybe it's an agent, but maybe it's just every single person that checks out on an e-commerce website goes through a fraud detection check that is now token powered and is not just, you know, a bunch of Python code, it's actually inferencing something or every time you book a flight, it runs some LLM check. Uh, and I imagine that that will be a huge driver of token consumption. Um, and I'm wondering how you see those two buckets balancing out.

You know, 100%. I think, you know, to give you topline number today, I'd estimate it's like 80% of stuff is human in the loop today and 20% is background. But that number is going to shift. And I I actually expect the crossover to happen this year where background dominates. And the reason is, you know, as you pointed out, you want to use these agents in workflows, deterministic workflows. And we just weren't there yet uh with our agents from six months ago. And we've just we've crossed a few barriers in the last few months. So yes, I think we have the unlocks required for agents to run a lot longer uh reliably on every action that a human puts into assistant.

Yeah. And then that's very good for your business because if I have something that's running on a Sunday when none of my employees are in, but it's still firing up $1,000 of cost, I want to come to you and get it to be $500. Like what what what type of pitch do you have in terms of savings?

You know, I don't really want to save my customers money. Okay. I actually want them to spend a lot more money with me because I've actually made the ROI so good that they're coming to me for way more to

and you know one of the ways I like to say it too is um you know I like to work on unbounded problems and before when we built human in the loop agents those were very bounded problems you have a limited number of

limited amount of patients to read agent output every day

but if a can run in the background for a long time well we've decoupled the two and uh there's no limit trillions of tokens per task is within reach

what were you and the team doing before this and and how long have you been at it?

Yeah, so I've been working on GPUs for about 10 years now. I love this stuff. It's my whole life.

I was at 10 years ago.

Is this a possible story where you're like, I was working on GPUs and you were just playing Counter-Strike or something?

No. Well, you know, I was at Nvidia, which business, right? You know, I remember being a little skeptical 10 years ago, like Jensen's talking this big talk about moving to AI, but like realistically, you guys, we do five billion in revenue from gaming. That's surely that's gonna be the biggest business for Nvidia for a long time.

I imagine well I could see that now. Um and then I was previously at Apple as well. Apple had a pretty competent ML program or ML silicon program. I won't say anything about their ML software program.

Sure.

Um and uh and then most recently I was at

Very cooling. Very cool. Uh

kind of a perfect background for this business.

What is Lip Bhutan like in person? I'm such a fan. He's an angel investor. How'd you meet him? What's the story? Yeah, I met him through our friends at Sequoia. They build great relationships like this one. Constantine in particular knows Liu very well. Uh Liipu is great. I mean, he's I've never met someone with that combination of like warmth and and business acumen, but also he deeply understands the chips for building. I mean, he can just like go from talking about Foundry to talking about uh you know, the the nuances of like how to scale an inference business in this very wild time. Um so, I love working with Lu. He's exceptional.

Yeah, what a what a wild run from him in such a short amount of time. uh one of the greatest story arcs in uh in technology.

And then who did who did the round?

Yeah, so Sequoa uh did the seed. Constantine and Lauren Reer.

Uh and then for the series A, we went with Kleiner Perkins did the for the lead. That's Adith.

Yeah.

Amazing.

Fantastic. Well, congratulations.

Fantastic progress soon. And thank you for everything you're doing. We appreciate

great to meet you.

Have a good rest of your day.

Cheers.

Let me tell you about Railway. Railway

← Back to story