Taalas raises $169M to embed AI models directly into silicon for ultra-fast, low-cost inference

Feb 19, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

day. We'll talk to you soon.

Thanks again for having me again.

See you.

Let me tell you about Phantom Cash. fund your wallet without exchanges or middlemen and spend with the Phantom Card. Up next, we have Lubisha Basik from Talis. Welcome to the show. How are you doing?

What's going on?

I'm doing well. How are you?

Did I Did I get even close to the name?

First name, second name close.

Okay, give it to us.

Bus. Okay.

Well, powerful.

First time on the show. Please introduce yourself and the company.

Uh, hey. So you introduced my name uh I'm a

kind of a longtime I guess chip designer more than anything else.

Yeah.

Uh my latest endeavor is Talis. What we do at Talos is uh

you know how I'm I'm sure both of you have used chat GPT right like and uh

when you ask a question

you see that like you know words are kind of rendering on the screen it takes a while.

Yeah.

Uh if you use it a lot it costs a fair amount of money right. So for people that uh uh that run code through it or or something like that. So what what we do is we basically make those responses show up in milliseconds. Yeah.

So that you can ask uh you know you can ask whatever you want and you get like 100,000 words it pops up in a split second immediately.

The web UI is slower than um you know than than our chips for the most part.

Uh and it's also kind of close to zero cost. So the idea was

slower. That's crazy.

Super fast, super cheap. You're like, "This HD, this JavaScript just doesn't load as fast as the tokens." What a world to be in.

That's happening. That's happening today.

Yeah,

that's happening today. And I guess the way we do this is kind of the opposite of what your previous guest was saying. So,

u you know, everybody says you should be flexible. We say uh you should be really inflexible.

Okay.

But you should be hardcore.

Okay.

Hardcore.

So, yeah, take us through the the technical story here. there is this only possible post transformer poste model like how much are you calcifying and what technical decisions do you have to lock in before you went into tape out

so we're not so much you know focused on a given model type so whatever model is popular today it's transformers with if it was something else uh very likely most likely we would have no problem consuming it uh really the the main step is that we take that model and we basically cast it straight into silicon.

Oh,

it's not a it's not a piece of software. It doesn't run on a processor. U you know, if at least I as a kid like I never imagined AI as a as an executable like Excel or

or Word, right? Like it was the general mindset was like Dave from Space Odyssey, right? Like or the the hell from Space Odyssey, right? like and uh so we're kind of you know going to that idea and basically making models straight into hardware once we make the chip. It's it is a model that's that's basically uh that's all it is. If you want a new model you have to chuck this one out.

Again it's kind of a throwback to um to the past like where it's kind of like video game consoles.

Yeah.

You know there's a receptacle uh you unplug a cartridge, you plug in another cartridge and you have a new model.

What are the different tradeoffs? When you go to the fab, are you thinking wafer scale, large die? How do you think about all the tradeoffs and what is available at modern fabs?

So our approach is that u you know we we basically made this one trade which is commitment or flexibility for kind of everything else. Mhm.

Given that we tried super hard to make everything else dead simple, which means no interposers, no 3D stacking, no super high speed IO, no HVM.

Uh the chips are pretty big, but they're really really simple.

Mhm.

Uh they plug into a normal server. They don't consume a lot of power.

You can cool them by air. You don't really need, you know, you don't need anything but old school data centers.

Yeah. So it's essentially we we made it such that there's there's this one you know sort of compromise that we make you take.

Yeah.

Having taken that it's better at everything and it's way simpler than anything else.

Yeah.

What is the what's the sales process and look like for this? And what what are your customer relationships actually going to look like in practice? Are you uh I imagine you know you you see massive uh progress on the model side. every day there's a new model. Are you like how are you going to scale up kind of like physical production to to sort of match that progress or is that the wrong way to think about it?

Uh you know it's it's hard to be sure. So so I'll I'll answer all all your questions briefly. So on on the first one our default plan is that we're we're bringing out a like a inference service and we're going to let users uh basically get to tokens directly. uh obviously we we aspire to to having large customers like the big AI labs or or hyperscalers. It's just that you know like if your default plan involves closing you know what have you Microsoft it's it's a bad default plan right like it's not an easy it's it's not an easy step so it's not that we're not doing it but we're we're doing something that's going to bring us to customers and allow iteration

uh and it's fully within our hands right like it it doesn't involve a big uncontrollable step uh to to your second question um we might so so our approach starts winning at about 3 months of holding the model. Even if you swap them every 3 months, we still win.

At a year, we win by a lot. So, at this level, it's really hard to call, but I I'm I feel strongly that there's applications that can certainly tolerate having the same model for a half a year at a time.

Yeah, we exactly those are I don't I don't know.

Yeah, someone was saying yesterday that GPT 3.5 still gets like a shocking amount of

so much money. Yeah,

it's crazy. So the other day GPT 4.0 was retired. Yeah.

There was a bunch of u sort of negative remarks on the internet. People are sad and angry that it's gone and so on. Um so I mean we definitely want to be at the leading edge. Uh we think that it's actually quite doable. U but in the end like where the line is where sort of people are comfortable with the commitment we need. Uh time's going to tell right like

Yeah. How much is AI accelerating the translation from the model weights or the model architecture into an actual uh chip design?

So in our first generation, not a lot at all. We use it but but lightly.

Uh for our second generation, we're actually u trying our best to maximize the amount of stuff that AI does versus what what people do. So we're we're aggressively moving that boundary

and the cutting edge models have gotten good enough that they can really help a lot.

Like it's we started about two years ago. Uh that wasn't the case back then.

Mhm.

It's a new phenomenon.

Yeah, makes sense.

Uh kind of a wild card. How are you thinking about uh what we expect is a explosion of new uh AI hardware devices? We're hearing about pins and inear things. Is this something that um could end up in in a consumer electronic device at some point or is it better suited for more of these like data center or or kind of like enterprise applications?

So I think it's it's suited to both. Um and but we're initially targeting data centers and we made these big chips for that reason as opposed to something that's

smaller form factor could fit in a device

really mainly driven by the fact that this kind of avalanche of of pin and and such has been announced now you know it's been 5 years away for kind of multiple sets of five years potentially right

yeah let's not talk about VR

yeah yeah and and if and you can obviously make a hardware device uh that can deliver AI without having the chip locally.

Yeah.

Well, uh Jordan, anything else?

No, let's hit the gong.

Okay. Yes. Tell us about the news. How much did you raise?

Uh we um actually the the special thing about today is that we've launched our our first product to the public, right? We we did raise some money that we didn't announce. Actually, we raised a a bunch of money. We we hit like kind of nirvana where we can live off of uh

off of interest like you know 220 million dollars raised total.

But what we're really proud of is that we spend money super slowly and that we we really are living off the interest right now.

That's amazing. Well, congratulations.

Where where can people go? Uh

do you have a

chatjimmy.ai if you want to demo it. Yes. Right.

That's that's right.

chatjimmy.ai send you guys the link if you chat. I I think you're It's uh It's Llama 3.18B.

That's right. Yeah.

Baked into a chip. Super fast.

Quick.

It's fast.

Yeah.

I love it.

It is instant. Faster than it's almost seems faster than my uh eyes can process.

← Back to story