Interview

Taalas raises $169M to embed AI models directly into silicon for ultra-fast, low-cost inference

Feb 19, 2026 with Ljubisa Bajic

Key Points

  • Taalas raises $220 million to bake AI model weights directly into silicon chips, trading model flexibility for near-instant inference and near-zero compute cost.
  • The startup launches a public inference service running Llama 3.1 8B on custom silicon, with tokens appearing faster than the web UI can render JavaScript.
  • Taalas argues the economics work if customers tolerate model staleness of three months or longer, citing sustained commercial demand for older models like GPT-3.5.
Taalas raises $169M to embed AI models directly into silicon for ultra-fast, low-cost inference

Summary

Taalas — the company spells it Talis in the transcript but the segment title uses Taalas — has raised $220 million in total funding and launched its first public product, a demo at chatjimmy.ai running Llama 3.1 8B baked directly into a chip.

The core idea is model-in-silicon inference. Rather than running a model as software on a GPU or CPU, Taalas casts the model's weights directly into the chip's physical architecture. The chip is the model — if you want a different model, you swap the chip. Founder Lubisha Basik, a longtime chip designer, frames this as the deliberate trade: sacrifice flexibility, win on everything else.

That tradeoff cashes out as near-instant output at near-zero inference cost. Basik says tokens appear faster than the web UI can render the surrounding JavaScript — a claim the interviewers test live and confirm. The chip runs in a standard server, requires no exotic cooling or high-bandwidth memory stacking, and draws low enough power for air-cooled, legacy data centers. No interposers, no 3D stacking, no HBM.

The model-staleness question is the obvious commercial risk. Basik's answer is that the economics hold as long as customers keep the same model for roughly three months or more — and he argues plenty of real deployments tolerate that. GPT-3.5 still drives significant commercial volume; GPT-4o's retirement generated user complaints, not celebration. The inference service Taalas is launching publicly is designed to bring in customers and prove the cycle time before chasing hyperscaler deals.

On manufacturing, Taalas is keeping chip design simple to compensate for the commitment baked into each tape-out. The first-generation chips used AI tooling lightly; the second generation is being built with a deliberate push to maximize AI's share of the design work — a shift Basik says only became viable in roughly the last year.

The $220 million raised is disclosed as already in hand, with Basik noting the company is currently living off interest rather than burning principal. That capital position, combined with a low-power, low-complexity chip architecture, gives Taalas runway to build a customer base through its inference service before any large enterprise contract closes.

The near-term product is data-center focused. Consumer hardware — pins, earbuds, ambient devices — is acknowledged as a possible long-run market but set aside, partly because AI hardware accessories have been "five years away for multiple sets of five years."