Chroma co-founder Jeff Huber: long-context windows won't kill RAG — they'll finally prove why retrieval matters
Apr 8, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Jeff Huber
Jeff, how you doing? I'm going. How have you been following? Uh, well, first, uh, welcome to the stream. Can you uh do a little introduction, but then I want to hear your reaction to the Llama 4 news, how you're processing it, and what it means for your business. For sure. Yeah. I'm Jeff Heber, the co-founder of Chroma.
Um we're working on retrieval for AI and broadly working with developers kind of across the ecosystem to build production systems with AI. Um a lot of it is focused on, you know, business applications, you know, good old fashioned business process automation.
And so always super excited to like see new like open source model drops. Uh, can we go to this uh this post from Sean? He says, "Unpopular opinion right now, but Llamas 4 10 million token window will finally actually end the long context versus rag debate, but not in the way that other guy is thinking.
" What does he mean by that? Yeah. Yeah, for sure. Um, I think, you know, Silicon Valley has a tendency to be sort of extremely intellectually shallow. This is both a strength and a weakness of the valley to be clear.
Um and in our view like AI is not this like deosmakina this like technical machine god you know where all of the information of all time is always going to be in the weights of this model. Um you know this is really just a new form of computing.
And so in the same way that we have a memory hierarchy in classic computers right we have the CPU RAM disk and network um we are also going to have a similar memory hierarchy in language models. And again it already exists today. Um we have the actual sort of transformer attention heads. We have the context window.
We have the retrieval system and tool use. And these things have different trade-offs, right? You think about kind of access speed, capacity, and cost. Uh there are trade-offs to all of these things. Um you know, I think like you you know, saying something is dead like plays pretty well on Twitter.
I've actually gotten myself into some trouble where I some people were allegedly ship posting and uh not actually sort of sincere posting about this and I didn't know, right?
because of hard to tell like what's a [ __ ] post and what isn't of course but you know the bait the bait is strong on Twitter and so what Sean is saying is actually that like we've all been there's a certain class of people who are like long context is all you need and these people are probably like 21 years old that's fine we love them but they just haven't seen how like uh real systems depend on trade-offs between speed cost and accuracy and like 10 million tokens is not a panacea you need to keep information outside of the context window you need to give developers and programmers control over what information is inside the context window.
You know, even these like needle in a haststack tests like are not actually that representative of like real world utility and reliability of long context windows. Um, you know, they mentioned in the training for llama 4, they don't even have passages that are longer than I think 250,000 tokens.
So, anything,000 is just synthetic data. We're just made up. Because what Sean is saying is that like what 10 million tokens is finally context window for llama 4 is finally going to put to rest is that long context windows are all you need.
He's going, you know, the 10 million the 10 million context window length is going to finally hopefully, you know, make people understand that like no, there are different things here that are good at different things and we can put them together to create a good system. Should we manitize the escaton?
I don't know how you knew that I was writing about this this morning. Uh, no, we absolutely should not. Yeah, we absolutely should not. It's always It's always been a trailer of tears. Let's not do that. Yeah, you you explained that to me a while back. I had fun with that. Um, so so let's talk about Llama 4.
How should uh how should startups be thinking about uh Llama as tool in the toolkit against the other options that they have. Yeah. I mean, I think like you know, Twitter is quick to and research in general, right, is quick to um sort of view state-of-the-art as the only thing that matters.
Um, and I think that actually in many cases being first is overrated. You know, we've seen, you know, going all the way back to sort of the Slack and Teams charts, right? You've seen the famous chart Slack versus Teams, right?
Distribution is incredibly important as long as, you know, sort of the incumbents can wake up and can catch up. Um, you know, I would not bet against Zuck and hundred billion dollars of profit per year. Um, you know, I think that, uh, you know, Zuck also is in some sense playing a different game.
like he's not trying to build like the the sort of very best open-source like chat experience for consumers. Um what Zuck sees I think rightly so is that you know having an open source model which is really good is good for the ecosystem and is good for meta.
Um, and you know, most businesses don't love using closed source models. They want to use open source models um, for all kinds of reasons. You know, privacy, security, continuity, cost, you know, you can build your startup on GP4 and it's amazing.
Um, and then, you know, there's a new version out and opening I deprecates, you know, the old version, right? And all of a sudden, all your prompts don't work the same. And so, you know, open source models are going to continue to play an extremely important part in the ecosystem.
Now obviously like you know the Deep Seek R1 launch a few months back like totally took everybody from surprise.
You know I think we're still in the early innings of this stuff where like good ideas can come from anywhere and oftentimes good ideas do come out of the sort of group think you know context of Silicon Valley right so uh you know but yeah I wouldn't I wouldn't bet against that.
Do you think um there's an opportunity to build a company like Red Hat in Linux but for LLM implementation on top of something like Llama or is that like a crazy idea that doesn't really match to the modern foundation model landscape?
I mean the bull case for llama for meta is that it's actually more equivalent to um how meta open sourced its uh data center um kind of layout and text right and that's the bookcase for for meta is that actually in industry sort of forms around that and become the standard right that's sort of why you know what argument sort of why they did it um in terms of like the Red Hat 4 you know I think that like Red Hat 4 works well for operating systems um but I don't think of an LLM as an operating system I think an LM is more like a CPU, right?
It's information processing unit. And so obviously it's a new thing. It's not exactly like a CPU. Um but yeah, I'd have to reason about that some more. I'm not sure. Yeah. Uh if you're running Meta AI, uh what would you do from here? Not to put you on the spot or anything. Yeah.
I'm not to be clear, I'm not running met AI. I've not received that job offer at all. Um I mean, I think that like you know, you have to keep going. Um you can't stop. Um, I think focusing on like the business use cases is pretty important.
I think focusing actually also on what developers actually need and want out of models is also very important. You know, you see a lot of like model drops that come out, but they don't actually provide the real hooks. And they do very well in the benchmarks, right?
They do very well on like kind of the public leaderboards, um, but they don't actually provide the hooks that developers need to do like good tool use or reliable structure data output or the practical stuff, right, that developers actually want out of models.
And so like if you want to create a ground swell of developers that like love your tools, like do the developer experience part like meet them where they are and like give them all the hooks that they need and don't just stop at like hey look, you know, we hit state-of-the-art benchmarking, aren't we special? Yeah.
Is there is there a narrative here where maybe they're trying to do everything all at once and instead should focus on like Llama is amazing at code or Llama is the next version of Llama 5 is like all about tool use or super great at reasoning or just like the best at deep research or just the best at at image generation for example.
like it feels like there's kind of a bifurcation of the market and maybe the opportunity is actually to to laser in on something that's high value but then let the other stuff kind of you know simmer out there amongst other teams.
I mean I mean focus is probably always a good you know lesson for all of us right do less and do it better. Yeah. And so you know presumably it's also true for meta. I think also obviously unlimited capital can both be a blessing and a curse in that way. Um, uh, yeah, again like f focus on developers.
Focus on what developers want. I think that's the beach shed. That's how you win the B2B market. If you win the B2B market with your open source models, like you get all of the sort of downstream effects that you want. Um, you know, you don't need to beat um, you know, GPD5 on some Yeah.
Uh, do you think that part of the narrative that we're seeing around Llama 4 is just pre-training, scaling, hitting a wall, a need for new algorithms, a need for a deeper focus on reasoning, and maybe even whatever comes after that.
I mean, you know, I I you know, so you mentioned a moment ago sort of, you know, imitatizing the eston, right? You know, throughout history, you know, every exponential that we've observed eventually results in a sigmoid occur. You remember early COVID, right?
the the fur of like, oh my gosh, you know, the Twitter guys doing their thing where they're like, well, if if double the amount of people get it every day, everybody on Earth will have had it seven times the next 100 billion people will have it. Yeah. Exactly. Exactly.
And so, you know, I think that there are laws of physics here. I think that there are, you know, um, diminishing, they're clearly diminishing marginal returns, right? We're sort of spending 10x on compute. We're not getting 10x or better models, at least evidently, not yet.
And so, you know, the transforms incredible is amazing. It's a you know technology is probably as important as the invention of electricity. Um it will probably you know bring about a increase in GDP that is on the order of the industrial and revolution or greater.
And so I think we should not like minimize this technology and sort of sort of boil it down to oh this is sort of just dumb pattern matching right by the same token um you know we also should not believe that all technology we're going to be able to rent seek on sort of forever. Yep.
So yeah new things are definitely needed. Um and uh you know I think you think that like you know infant time compute um internal chain of thought is really promising.
Um and you know you I look at the stack today and I think about how sophisticated computers are right and how computer architectures are and operating systems and kernels and compilers and all of this stuff and you know we're just like in the baby phase today of AI like it's just like in its infancy and there's a lot to build.
From a recruiting standpoint have you run into some of these super aggressive non-competes that we're seeing? There was a headline today about, you know, Google basically paying engineers to not work at Chroma, not not work for a year when, you know, they could be working at Chroma or any of these other labs.
I mean, yeah, you know, um, airplane red dots. png, right? I guess like if I was affected by that, I wouldn't know it. So, yeah. Yeah. Uh, what c can you take us through some of uh what Chrome is building today and where customers are getting the most value?
I've talked to you a little bit about some of the use cases and I think they're underrated potentially in like how simple and obvious they are when you explain them, but I want you to take me through some of the modern context.
Yeah, I mean you've heard left and right on the internet now for like three years all about this acronym RAG. Um I don't know why anybody would ever name something RAG. That seems like a pretty dumb idea. Uh we just call it retrieval.
Um and of course the idea with retrieval is that if you want to build an AI system and you want to be good at something um well you need to teach it how to do that. You got to teach it about your data. You got to give it your instruction set, right?
And updating the weights of the model is not a very good idea because you cannot really deterministically control that, right? You can fine-tune, but what you're going to get the other end, you know, again, you don't really control.
And so, uh, giving the system access to a repository of instructions or knowledge about your organization, your business problems, um, that is something that you can control and that's what that's the problem that retrieval solves.
And so, you know, you know, we talked to like enterprises and businesses building like useful applications.
I think like today 90 plus% of it in enterprises is retrieval augmented generation or it's you know using retrieval it's sort of chat on top of unstructured data you know I think if you zoom out though and view like what is really AI right AI gives us the primitives and the ability to process unstructured data in a common sense fashion and you think about the scale of data right even today like inside of enterprises like unstructured data is like 10 times the size of unstructured data we have 10 times more unstructured data and then you consider like the real world right if we're like putting robots out in the real world like how much unstructured data they're going to be ingesting and needing to process and reason about and action on and like it's just sort of like you know going to be a thousand x 10,000x 100 thousandx the data that we have today and so like that's kind of the direction I think is like not so much like sort of this you know sort of um um simple like one human one AI talking in a chat stream back and forth um but it's like real embodied intelligence which you know you could call an agent you can call a robot you know I don't love any of these terms.
Um, but like really the goal here ultimately I think for anybody who's built something practical is building something that's reliable. You know, you think about like we've been seeing self-driving touted as like this technology for like 10 years.
And of course, if you live in San Francisco, you can use Whimo and it is actually incredible, but it's taken 10 years. The gap between demo and production has always been so great in AI.
And so you if you're building something practical and AI, your big question as a developer is like, okay, the demo is super sexy and cool, but how do I actually make it work really, really, really well and reliably, um, and the ability for these systems to sort of like self-improve or improve under human guidance, I would say, is like the biggest thing that's underrated today.
And of course, you know, we think that like retrieval plays a key part in kind of how that happens. Can you uh concretize that a little bit by uh walking me through like a a potential use case for us? I mean, we stream three hours a day.
We're probably emitting, you know, I don't know, tens of thousands of tokens every day. If I used whisper, I trans I I transcribe every minute of our show.
Uh, I could search that through, you know, fuzzy search or deterministic, if I want to just search like every time I mention artificial intelligence directly, find that. Uh, or I could try and fine-tune Llama 4 on it and maybe it just hallucinates like, oh yeah, John was talking about this randomly.
uh how how would I how would I uh use Chroma to create a a more definitive index of every time John or or a guest has talked about artificial intelligence or or llama in you know hundreds of hours of video like is that something you could do? Yeah. Yeah.
I mean, we're seeing some like kind of fun things today where like, you know, people are taking the corpus of all of their writing or all their speech and they're kind of like quote teaching the model it, they're loading it into a tool like Chroma, hooking it up to a language model, and they're giving end users the ability to like chat with John and like, you know, see what John thinks about artificial intelligence, right?
And um that's exactly right. So, kind of all those transcripts get processed, they get broken into pieces, they get indexed and searchable in various ways. Yep. And then the user asks the query, you know, hey John, what do you think about the latest, you know, llama release? Or maybe they say llama.
They say the latest release AI from Facebook thing, right? Yeah. Exactly. Llama. Yeah. Like the search is good enough that it can like find all the relevant things that you've said and then the album can like respond as you because it kind of ground itself in the things that you've said before.
and and so it's basically taking like different blocks of text, different ideas, and then kind of vectorizing them into some way that's not necessarily human readable, but it can still it's basically like better fuzzy search in many ways. Not to degrade what you're doing, but it's amazing.
It's magical and and super powerful. Fuzzy Yeah, fuzzy search is uh really useful when people like are not, you know, experts in their own data, right? Is that your Google Drive? You know how to search for stuff pretty well, right? But like your users don't know how to search for the stuff that you've said before.
And so that's kind of the power of like embeddings and vector search tool in the toolbox. It's not a panacea. Again, we're noting the escaton here. We're not theic, right? But it's like a very powerful tool and people are getting a lot of value out of it. Yeah. I'm curious your reaction to AI 2027.
uh if our our our point of view generally just from all the conversations we've had is that like sort of model progress uh and advancements could sort of slow and that would be fine just because there's so much value to unlock out of the underlying models.
I'm I'm clear I'm curious to think how you processed uh just the forecast generally. Um yeah maybe maybe take it from there. You think the capability overhang we have in the models that we already have today and we will have absolutely in six months is immense.
You you think about for example the possibility of democratizing access to state-of-the-art services to everybody on earth.
Like it is very possible the poorest people on earth today or you know in 10 years will have access to better healthare uh better legal representation um you know better financial services than like billionaires have today. I think that's like entirely possible and that's impossible with the model we have again today.
So the capability overhang is immense. Um you know every time an extremely long essay from a sort of effective altruist drops, right? You know they clearly tends to make waves. I think if you tell people that the world is going to end, they're going to pay attention.
Um and you know I'm just like not that frankly that interested in like secular esquetologies about you know apocalypse and the end of the world, right?
I think like there's a natural tendency for all humans to believe that like we are the chosen ones living in the special time in the last days right you know even uh Fukyama right you know wanted to uh you know sort of like end history right and so the natural human tendency you know this is again the imitatizing the escaton we'll mention it three times now tendency it's like it's really dangerous right you think about like um what's happened throughout the last hundred years in like really you know uh you know the the hundreds of millions people that have died, you know, across like different world wars and different, you know, dictatorships.
Like it is often times this like messianic complex that leads to a lot of that. And so, I don't know, I'm just like I think it's I see it as entertainment more than anything else. Yeah.
On a more practical note, uh like I go to the Wall Street Journal's website, I just try and search for an article and they say, "Oh, search is powered by AI. " It's not clear. It's clearly not powered by AI because I cannot fuzzy search at all.
I can't say, "Oh, I know that it mentioned this person and I think it was about this and it was in the last week. It's not there. " What does it take to actually roll this stuff out? Are are these even potential customers of Chroma or is there another company to be built here? What do you think about that? Yeah.
Um Wall Street Journal, if you're watching, you know, send me an email. Happy to chat. Uh yeah, all that's very doable today. I think that, you know, the reality is that um you know, your classic, you know, the future is already here. It's just not evenly distributed yet, right?
like you know any technology of consequence even if generationally important you know still takes decades to roll out and um you know that's just the same is true here. So that's great. Well thanks so much for stopping by. We got to move on but this was a fantastic conversation. We'll have to have you on more.
Really appreciate it. Talk to you soon. Bye. And uh we got a big funding announcement. We're shifting gears. We're out of AI and into manufacturing. Gonna talk tariffs. going to talk industrialization, another theme we love on this show. We have some big news.
Uh, and I just want to know, was this fund raise announcement intended to always intended to go out today or did they bring it they they bring it up? Oh, because tariffs and everything because the timing entirely possible. Just too good.
So, uh, Jay says, "Today I'm excited to launch the Advanced Manufacturing Company of America. We've raised $76 million. Let's hear it for a massive round coming out of stealth. Uh from Caffeinated Capital, that's Raymond Tonsing, uh Founders Fund, Lux Capital, Andre Horowitz, and others.
The best time to build this business is right now. Yeah. No joke. But the real work began decades ago, and they uh launched. He just decided, I'm going to get every big fund. Yeah, I'm just get them all. Yeah, it's great. Ran a process. He says yes to everyone. Take a bit from everybody. Uh and it's great.
They they put out a 4-minute uh video produced by J Jason Carman Story Company.
Uh it's beautifully lit, beautifully shot, and they brought in, you know, we've been hearing for a long time that the legacy manufacturing companies are um run by uh folks who are aging out and maybe they don't have the next generation lined up to take over the business.
Well, they sat down, they interviewed one of those folks and it's a fantastic video. You should go check it out. Uh anyway, uh is he ready to come on in the studio? Let's bring him in and hear the news from him directly. Welcome to the studio. How you