Interview

Taalas raises $169M to embed AI models directly into silicon for ultra-fast, low-cost inference

Feb 19, 2026 with Ljubisa Bajic

Key Points

Taalas raises $220 million to bake AI model weights directly into silicon chips, trading model flexibility for near-instant inference and near-zero compute cost.
The startup launches a public inference service running Llama 3.1 8B on custom silicon, with tokens appearing faster than the web UI can render JavaScript.
Taalas argues the economics work if customers tolerate model staleness of three months or longer, citing sustained commercial demand for older models like GPT-3.5.

Taalas raises $169M to embed AI models directly into silicon for ultra-fast, low-cost inference

Summary

Taalas — the company spells it Talis in the transcript but the segment title uses Taalas — has raised $220 million in total funding and launched its first public product, a demo at chatjimmy.ai running Llama 3.1 8B baked directly into a chip.

The core idea is model-in-silicon inference. Rather than running a model as software on a GPU or CPU, Taalas casts the model's weights directly into the chip's physical architecture. The chip is the model — if you want a different model, you swap the chip. Founder Lubisha Basik, a longtime chip designer, frames this as the deliberate trade: sacrifice flexibility, win on everything else.

“Taalas's innovative approach to AI hardware embeds AI models directly into silicon, resulting in ultra-fast, cost-effective responses. This method sacrifices flexibility for performance, enabling immediate, large-scale outputs at minimal cost.”
— Ljubisa Bajic

That tradeoff cashes out as near-instant output at near-zero inference cost. Basik says tokens appear faster than the web UI can render the surrounding JavaScript — a claim the interviewers test live and confirm. The chip runs in a standard server, requires no exotic cooling or high-bandwidth memory stacking, and draws low enough power for air-cooled, legacy data centers. No interposers, no 3D stacking, no HBM.

The model-staleness question is the obvious commercial risk. Basik's answer is that the economics hold as long as customers keep the same model for roughly three months or more — and he argues plenty of real deployments tolerate that. GPT-3.5 still drives significant commercial volume; GPT-4o's retirement generated user complaints, not celebration. The inference service Taalas is launching publicly is designed to bring in customers and prove the cycle time before chasing hyperscaler deals.

On manufacturing, Taalas is keeping chip design simple to compensate for the commitment baked into each tape-out. The first-generation chips used AI tooling lightly; the second generation is being built with a deliberate push to maximize AI's share of the design work — a shift Basik says only became viable in roughly the last year.

The $220 million raised is disclosed as already in hand, with Basik noting the company is currently living off interest rather than burning principal. That capital position, combined with a low-power, low-complexity chip architecture, gives Taalas runway to build a customer base through its inference service before any large enterprise contract closes.

The near-term product is data-center focused. Consumer hardware — pins, earbuds, ambient devices — is acknowledged as a possible long-run market but set aside, partly because AI hardware accessories have been "five years away for multiple sets of five years."

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Cerebras raises $1.1B as Andrew Feldman bets faster AI inference changes what's possible — not just speed

Sep 30, 2025

Cerebras CEO Andrew Feldman on the IPO, the CUDA myth, and why fast inference will be 'all of the market'

May 14, 2026

SiFive raises $400M Series G to build RISC-V CPUs for AI data centers, with Nvidia participating

Apr 13, 2026