Canopy Labs is training LLMs on movement tokens to build virtual humans indistinguishable from real ones
Key Points
- Canopy Labs trains LLMs on tokenized movement and speech to generate virtual humans capable of realistic micro-gestures and incidental behaviors without explicit programming.
- The startup targets B2B applications in language learning, AI therapy, and tutoring where realistic avatars could improve engagement over voice-only interfaces.
- Canopy's success in creating indistinguishable virtual humans creates a verification problem: the infrastructure to prove human identity in professional settings becomes critical and unresolved.
Summary
Read full transcript →Elias, founder of Canopy Labs, is building virtual humans designed to be indistinguishable from real ones on a live video call — the benchmark being a Zoom conversation where you cannot tell whether you're talking to a person or a model.
The technical approach centers on extending the LLM architecture into new modalities rather than using diffusion models. Canopy tokenizes 3D representations of human movement and speech, feeds them through a language model, and has the model output those same token types. The result is a virtual human that learns incidental behaviors — rubbing its nose, picking up a glass of water — without being explicitly programmed to do them. Realistic facial geometry is treated as a solved problem, with Unreal Engine's Metahumans as a usable substrate. The remaining hard problem is movement: lips, hair, hands, and the micro-gestures that push a rendering across the uncanny valley.
Canopy has already open-sourced one model — a voice system that takes text as input and outputs speech tokens, with a second end-to-end variant that processes both text and speech tokens. Movement tokenization is the next extension.
“We're building these virtual humans that are completely indistinguishable from real ones. Rather than putting in text tokens, we're taking in movement tokens and speech tokens, feeding that through the LLM, and it outputs those speech and movement tokens as well. So a virtual human will be able to pick up water, brush its hair, maybe rub its face when it's thinking like us humans do — we're not explicitly telling it to do that, it learns that automatically.”
Go-to-market
The initial commercial target is B2B: partnering with LLM-native applications that want a human presence layer. Elias names language learning, AI therapy, and AI tutoring as the clearest fits — categories where a face and natural movement plausibly improve engagement and retention relative to voice-only interfaces like ChatGPT's current mode.
The identity problem
The harder downstream question is verification. If virtual humans become convincing enough to impersonate real people in professional settings, the infrastructure needed to distinguish human from AI presence — tiered credentialing systems, something like Worldcoin's proof-of-personhood model — becomes a dependency on Canopy's own success. There is no resolution to that tension in the segment; it is raised as an open question.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.