News

Thinking Machines Lab launches interaction models with native full-duplex audio and video

May 12, 2026

Key Points

  • Thinking Machines Lab launches interaction models trained natively for full-duplex audio and video, allowing systems to respond while users are still speaking rather than adapting turn-based language models.
  • The architecture targets real-time translation use cases, where latency and quality of current cloud APIs create friction in scenarios like mask-based translation devices enabling non-fluent speakers to teach in foreign languages.
  • Managing latency low enough for natural conversation remains the primary engineering bottleneck, with local inference at sufficient quality still technically difficult to achieve at scale.

Summary

Thinking Machines Lab launches interaction models trained natively for real-time full-duplex audio and video — a departure from adapting turn-based language models to handle simultaneous input and output.

Full-duplex capability means the system can stream audio and video input in real time and respond to the user while they are still speaking, eliminating the turn-taking constraint that defines most current voice interfaces. Mira Murati, CEO of Thinking Machines, positions this as a fundamental architectural shift rather than an incremental feature addition.

The technical implication is substantial. Most voice AI systems today rely on turn-based interaction: the user speaks, the system listens, processes, and responds. Full-duplex systems must handle overlapping speech, video context, and real-time generation simultaneously — a harder problem than retrofitting existing models to handle interruptions or parallel streams.

The use case framework emerging in the segment centers on real-time translation and multimodal interaction. One referenced example involves a mask-based translation device in China that allows a non-fluent English speaker to teach her children in English by translating her Mandarin speech in real time through the device's speaker. The children respond in English, which is then translated back through headphones. The implication is that native full-duplex models could enable such scenarios without the latency and quality compromises that plague current cloud-based translation APIs.

One tension flagged: current best-in-class real-time translation models carry API costs and introduce measurable delay, and running inference locally at sufficient quality remains technically difficult. Getting latency low enough that human conversation feels natural has proven to be the primary engineering bottleneck.

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.