Interview

Chroma co-founder Jeff Huber: long-context windows won't kill RAG — they'll finally prove why retrieval matters

Apr 8, 2025 with Jeff Huber

Key Points

Chroma co-founder Jeff Huber argues that larger context windows won't kill retrieval-augmented generation because Llama 4's training data lacks passages longer than 250,000 tokens, forcing reliance on synthetic data for production reliability.
More than 90% of enterprise AI applications already use retrieval because fine-tuning is non-deterministic and context stuffing is expensive, not because it's fashionable.
Huber urges Meta to stop chasing benchmark leaderboards and instead focus on developer experience like structured data output and tool integration, which matter more to B2B customers.

Chroma co-founder Jeff Huber: long-context windows won't kill RAG — they'll finally prove why retrieval matters

Summary

Jeff Huber, co-founder of Chroma, argues that larger context windows will not make retrieval obsolete. Llama 4's 10-million-token context window will actually settle the debate in retrieval's favor.

Huber frames AI as a new form of computing with its own memory hierarchy. Transformer attention heads, the context window, and retrieval systems each occupy a different tier, with distinct trade-offs across speed, capacity, and cost. A 10-million-token window adds a very large, very expensive tier but does not collapse the hierarchy. Llama 4's training data does not include passages longer than roughly 250,000 tokens, meaning anything beyond that relies on synthetic data. That gap matters for production systems.

Needle-in-a-haystack benchmarks do not reflect how retrieval actually functions in enterprise deployments. More than 90% of enterprise AI applications use retrieval-augmented generation. The reason is practical: fine-tuning model weights is non-deterministic and context stuffing is expensive. Retrieval gives developers controllable, auditable access to organizational knowledge.

“10 million tokens is not a panacea. You need to keep information outside of the context window. You need to give developers and programmers control over what information is inside the context window. Even these needle-in-a-haystack tests are not actually that representative of real world utility and reliability of long context windows. Today 90 plus percent of enterprises is retrieval augmented generation.”
— Jeff Huber

The case for retrieval at scale

Unstructured data inside enterprises is already roughly ten times the volume of structured data. As AI systems move toward embodied, agentic contexts like robots and autonomous workflows, that ratio grows further. Retrieval becomes the foundational infrastructure that makes those systems reliable at scale, not just capable in demos. The gap between demo and production is the defining challenge of applied AI right now. Self-improving systems that learn under human guidance are an underrated piece of solving that gap.

Meta's open-source strategy

Huber sees Meta's open-source positioning as broadly constructive. Most enterprises prefer open-source models for reasons of privacy, security, cost, and continuity. The risk of building on GPT-4 only to have OpenAI deprecate it mid-product is real. Meta's deeper ambition with Llama may mirror how it open-sourced its data center architecture: less about winning a benchmark and more about becoming the industry standard.

Huber advises Meta to stop chasing state-of-the-art leaderboard scores and focus on developer experience. Reliable structured data output, tool use hooks, and practical integration primitives matter more to the B2B market than headline benchmark numbers. Unlimited capital can diffuse focus as much as it can enable it.

Scaling and what comes next

Huber acknowledges diminishing returns in pre-training. Spending 10x on compute is not producing visibly better models yet. Inference-time compute and chain-of-thought reasoning look like the most promising near-term directions. The transformer is a genuinely historic technology, potentially as significant as electricity, but the current AI stack is still in early infancy compared to mature computing architectures. New approaches are coming.

On forecasts like AI 2027, Huber is skeptical without dismissing the possibility. The capability overhang in models available today is enormous. He thinks it is entirely plausible that within a decade the world's poorest people will have access to healthcare, legal, and financial services better than what billionaires can buy now. Long-form eschatological forecasts do not interest him much. The belief that we are living through the final, decisive moment of history is a recurring human pattern with a poor track record.

Read full transcript →

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.

You might also like...

Jeff Huber on context rot, RAG vs. long-context windows, and AI adoption curves

Jul 14, 2025

Chroma co-founder Anton Troynikov on LLM psychosis: AI models are creating a new kind of crazy through sycophancy and memory features

Jul 28, 2025

Morphik: Open-source multimodal search for AI agents reaches $150M-valued Glean-adjacent market

Jun 11, 2025