Crusoe acquires stealth AI inference startup Atapar to dominate GPU memory optimization

Aug 21, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Featuring Chase Lochmiller & Alon Yariv

24/7 concier service. It's a vacation home but better folks. Uh and we have our first guest of the show Miller is here. Welcome to the stream Chase. How you doing? calling in from a log cabin set. That's a beautiful background. Where are you? Great. Television broadly. I'm in I'm in Jackson Hole right now.

I have an event that I'm attending out here. So amazing. Found a found a corner of the log cabin to call in to to hang out with you guys. I love it. Uh give us the update. What's the news? Break it down for us. Um, you know, we are excited and and thrilled to share that, uh, you know, Cruso made an acquisition today.

Uh, uh, we Let's go. Let's go. Get the gong ready. What'd you buy? What' you buy? Cruso acquired a business called Aero. Um, sorry. Sorry to keep interrupting. I get a little excited. Um uh so Cruso Cruso acquired a business called Aero and we're really thrilled to welcome the entire Ato team to Crusoe.

Um you know Ataro has been a a company that's mostly been operating in stealth um but they uh have been focused on optimizing the very lowlevel pieces of the high performance computing infrastructure stack um particularly around memory optimization.

So um you know there's a lot of investment being made today uh for these very large clusters these large scale AI factories um that are bringing you know intelligence to the masses and and embedding intelligence in in the economy.

Um but uh you know the a lot of that infrastructure is not used nearly as efficient efficiently as it as it could be and uh a lot of that has to do with like getting the data into the right place at the right time. Um so that your GPU utilization rate can go way way up.

Um and you know sort of our commitment in in uh this acquisition is uh you know building both very very large AI large scale AI factories and then building the operating systems um and uh you know the software systems to basically operate those AI factories with incredible degree of efficiency and uh scale.

So amazing I I want to go into uh I I want to go into trends about inference cost and and kind of what's going on in the broader market. There was this amazing Google paper today by Jeff Dean about saving water and energy. It seems like there's a lot of progress.

Um, but first let's hear uh from Alon about uh the actual journey. When did you start the company? Uh what what what made sense about teaming up with Crusoe here? Okay. We started the company last July.

So about a year ago now with the understanding that inference is going to be the next huge frontier in this space like training was all the rage for the last few years people all of the sudden realize that they need to make money off of AI and inference is the only place that you actually make any money building a model.

Yep. What we realized in addition like one of our members of the founding team Beness he came out of open AI he led the infrastructure in open AAI for about 5 years. This guy knew what it means to run inference at scale. He knew what are the challenges. He knew the challenges of scaling.

He knew the challenges of orchestrating different workloads on the GPU clusters. And our realization was that it is not enough to run models at a really high efficiency when you're running a single model on a single GPU.

But the challenge comes when you actually have a real world workload that breathes and changes size like changes the amount of resources it consumes, changes what models are being served, what work, what type of prompts are being asked of it to do.

And what we realized is that memory is the main bottleneck for this infrastructure for inference. Like first you need to shift the model into a GPU. Models are huge. They're about a thousand times larger than standard containerized applications. Like just how much time it takes to put it into a GPU. Sure.

Secondly, user assets are also huge. Like typically when you have a user session in a standard like consumer application, you have a few photos, a few text assets. Those weigh few kilobytes, megabytes. KV cache volumes which are the equivalent of user sessions in inference. Those weigh gigabytes.

So it's another times like it's it's another increase of three orders of magnitude of how much memory you need into a process. So that's what we have solved with our technology. We have the ATOM technology, the unified memory layer.

This allows you to utilize the GPU cluster to its fullest, shift memory assets fluidly and in the highest possible speed throughout the cluster. And this allows you enormous performance improvements in everything related to inference. What layer of abstraction are you operating at?

There's obviously uh a host of different large language models, some open source, some closed source. There's a variety of different chips and and and and you have stuff from Amazon and Google like are you sitting above CUDA in the NVIDIA ecosystem? Are you agnostic to any particular piece of the stack?

Just help me kind of understand when uh a an AI company would pull this particular product off the shelf.

So we have built throughout the last year few different ways of structuring this offer and this also relates to your question but the way I like to look at it I have two two part two two analoges in mind when I think about our tech you can think about us as an in-memory high high throughput cache. Mhm.

So it's it's kind of like a database, a radius for your AI application that gives you the assets as fast as possible into the process. So I like to think about it as something that is sort of orthogonal or in parallel with the with the CUDA stack and the VLM.

So we integrate on the side and allow you to shift in and out assets from the existing stack very rapidly. This actually allows us to very easily integrate to everything like we can integrate to VLM but we can integrate to any inference runtime.

And while we currently only support Nvidia officially or Nvidia hardware officially um we are by no mean shape or form constrained to Nvidia hardware. Yeah, that makes sense. That's one one one way I want to look at it.

The other way that I find more inspirational and kind of like tells about this partnership with Cruso is I like to think about as the sort of native virtualization for this new world. So it's virtualizing the memory which is the main asset. So I think about it as the equivalent of VMware for the wave of AI. Mhm.

That makes sense. Uh, Chase, uh, when I think of Cruso's business, I think of like the big projects that get cinematically filmed in Bloomberg documentaries and then I also think about Cruso Cloud um, and the reports from Dylan Patel over at semi semi analysis around cluster max.

Um, what does this allow you to say to customers that you couldn't before? Is this purely going to help with pricing in some sort of commodity offering? Is this a differentiator? Is this something that uh will increase reliability and uptime?

I think a lot of people are maybe underpricing what it takes to actually not just yes, I have a chip and I'm selling it at a cheap price, but if it's offline all the time, people aren't going to be happy.

Um, so what's the shape of how you're uh how you're sharing this with customers and where you hope this goes from like a sales position? Yep.

Um well, you know, we we feel like a lot of the technology that Alain and his team have have built um at Aero is incredibly complimentary to our infrastructure software stack at Cruso Cloud.

Um and a lot of this is uh really about accelerating performance and enhancing performance across everything from, you know, h how do you actually get the most out of the GPUs that you're you're paying for?

Um and and and how do we do that through uh things like uh you know more optimizing or u how do we do that by by things like uh you know investing in the storage layer so that you're you're basically able to use multiple different layers of storage very cohesively to you know have your data in the right place at the right time to utilize your GPUs more effectively and so I think from an efficiency standpoint it's a major investment From a reliability standpoint, it's a major investment.

And then, you know, leveraging uh the scale that Crusoe has uh and being able to operate that infrastructure um at, you know, gigawatt scale, I think is like a a very important aspect to um you know, this this investment we're making.

Um you know, when I think about like benchmarks and metrics, um I think like, you know, one of the things that people are are frankly looking at is like dollar per token, right?

And if you you actually you know if you double your utilization rate of your GPU uh simply by you know optimizing how you manage your KV cache how you you know manage your uh you know you know h how how the model actually gets managed um then uh you can actually drive down the cost per token by half right so when we think about it it's like what is the cost of intelligence running on cloud and like the intelligent result um that folks are actually you know coming to us for at the end of the day um and how do we actually drive massive efficiencies for all sorts of different high performance workloads.

Yeah. It goes from cost per token to the for the end customer uh in the end profit profit per token would be the exact Yeah. I want to hear about Yeah.

I mean, if you're if you're if if you're you know, people get very focused on the dollar per GPU hour, but if your GPU isn't stable, it's not reliable, you know, you can't move data into it effectively, you don't actually know how to utilize it effectively for your own, you know, hosted inference workload, um you're not going to be able to create that much value for ourselves.

So, you know, our goal here is to, you know, invest in the platform and to drive massive performance gains that will ultimately enable our customers to drive more value for themselves by using Cruo Cloud. There was a there was a post we covered earlier on the show from an account called Bubble.

They said AI inference is the new high frequency trading and the companies that serve models and don't realize that will will struggle. Is how what's your reaction to that? Is that is that what it kind of feels like right now internally in terms of just like it feels like speed, performance, cost?

Yeah, you hear all these crazy stories about hypery traders like you know Jane Street maintains Okamel like a special language just for hyperre trading. They're all they they'll build their own e uh their own like uh backbones if they need to uh just to move data more uh more quickly.

Um it feels like there's a there's potential that we're going in that direction. Uh yeah, you know, speaking as someone who spent 10 years working in the high frequency trading sector, uh I don't know if you guys knew that, but uh I uh that's lower that's lower. That's great.

Uh you know, I I do think that there are elements of this that uh rhyme with kind of what's happening in high frequency trading.

you know, they're they're um you know, we we I view it very much as a a massively expanding pie though as opposed to kind of a fixed pie that you know, the high frequency traders are competing for.

Um and you know, as performance gains increase, utility of you know, this infrastructure will go up and demand will go up with it.

So it has kind of this like reflexivity that you know if we can actually as an industry help drive better performance we can actually grow the pie and make you know make make the impact of what we're doing way way larger.

So I think it has like a you know positive force that you know high frequency trading doesn't in that regard. Um, that being said, you know, I I do see like hosting of, you know, hosting of more uh widely used open source models to be something that will commoditize over the course of time.

And, you know, things like speed, cost, and, you know, a bunch of different tricks to optimize performance are definitely going to be instrumental in um, you know, providing a great market product to the marketplace. Yeah.

I mean on that on that note of like tricks uh do you feel like we're on a Moors law type curve for cost per million tokens where not necessarily that it mirrors Moore's law in in getting half the price every 18 months but it just that it's somewhat predictable versus the progress on the usefulness of LLMs felt extremely spiky and you get these crazy branches in the tech tree and research paths that just all of a sudden produce a ton of value and it kind of goes from zero to 100.

Um, but on the cost side, do you think that we're going to be chipping away at this, making solid gains on a predictable cadence or do you think there there might be like one weird trick that that uh, you know, every AI lab will love? Um, look, I think it's both uh is the answer.

uh you know we're definitely like riding the you know Nvidia roadmap curve to like you know better more performant you know chips that um we're able to drive down our you know dollar per flop uh by just kind of you know continuing to grow and build on you know a lot of the innovations that are taking place in the silicon companies but certainly there's a lot of low-level software optimizations in ter and as Alain like frankly put it you know I think there's this massive opportunity to kind of have this you know VMware for for for memory where you know that ultimately is this critical piece of the stack to to produce valuable results from these chips.

Amazing. Well, congrats on the uh announcement. Thanks so much for hopping on. Uh enjoy the rest of your trip and uh safe travels. We'll talk to you soon. Appreciate it, guys. Great chatting, guys. Goodbye. Cheers. Let me tell you about graphite. dev, the AI developer platform. The AI developer productivity platform.

from Graphite helps teams on GitHub ship higher quality software faster. Get started for free. Uh I I'm very much enjoying the soundboard today. Missed dearly on Monday. It's back in action. Uh we have Think about a new The thing about a new sound effect, too. It's like it's like discovering a new favorite song.

You play it like a hundred times and you got to find a new favorite song because it doesn't hit quite the same once did. But for now, we're enjoying We We We have an alert noise. I'm pulling for that to come out during this next interview with Phil from Draq. He