ZeroEntropy raises $4.1M seed to build precision RAG retrieval tools and launches new reranker model to cut AI hallucinations
Jul 10, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Ghita Houir Alami
in the context of uh sports betting. Uh well, thank you so much for stopping by. This is fantastic and congratulations on the launch. Appreciate it. Check us out at consensus. app. Deep search launch today. Thanks, guys. Awesome. We'll talk. Thanks for coming on.
Uh up next, our lightning round continues with Rita from Zero Entropy, uh a YC graduate, uh doing automated retrieval and announcing a seed round of $4. 1 million. So, let's seed round alert. Seed round alert. 4 million. That used to be a series A couple years ago. Now, just keeps ticking up.
Congratulations on the round. Welcome. How you doing? Thank you so much. Super excited to be here. Thanks for joining. Um introduce yourself. Introduce the company. uh how'd you get started? What do you do? Yeah, so I'm Rita. I'm one of the co-founders of Zero Entropy. Uh a little bit about myself.
So uh my background is in applied mathematics. Um I have two masters in the field, one from Ekarpi technique, one from Berkeley. Um I guess I started more into the computer vision side of things and then I discovered GPT2 and GPT3 and I was like, "Oh my god, this is this is huge.
" And I started thinking about, you know, personal assistance and stateful AI systems. And I guess that's what led eventually to zero entropy and building retrieval systems and bringing context into LLMs. Um, and so that's what we do. We we build search for RAG and AI agents. Okay. It it feels like a crowded space.
I know a few founders that are working on rag. There's also rag implementations at the hyperscalers and the clouds. Um, how are you differentiating? What's the key insight? What's the pitch to companies to come over and use your service as opposed to the other options out there for retrieval? Yeah, absolutely.
I think it's about having the right abstraction. Um, so we solely focus on the retrieval side. We don't do the entire rag end to end because we believe that developers need to have their own prompts into generating the answer. They need to use zero entropy as a search tool for their own AI agents.
Um, we're also developing our own uh models. So we just released um a reranker yesterday um which was pretty exciting. Um and I guess the winning solution needs to be extremely accurate but also extremely fast and just be production ready and easy to implement um for various use cases.
What's your take on benchmarks currently? It feels like solving a really hard math problem and and you know retrieving the right document at the right time are somewhat unrelated. Uh, and so how do you evaluate if your system's getting better? Yeah, that's a great question.
Actually, um, the evaluation side of things is is is very messy. Um, almost everyone that I talked to, they basically rely on manual inspection, um, to make sure that their retrieval is working correctly. So, um, we've been looking into the evaluation side a lot.
Actually the very first thing that we did is release our own benchmark that was on legal documents and that really evaluated just the retrieval step um of rag meaning from a question was I able to pull all of the documents and only the documents that I needed because the problem is that if you feed your LM too many tokens that it doesn't need it's just going to hallucinate.
So the precision and the recall side of things are are extremely important and we're we're rolling out our own um evaluation solution um in the next few weeks that we've been using internally so far. What does the rest of the stack look like? I know you said you were kind of rag provider agnostic.
Are you also model agnostic, cloud agnostic, database agnostic? Like where have you actually made bets? What uh what pieces of the stack are you particularly aligned with?
Yeah, I think um you know building context uh context engineering um is going to be a new you know class of of products that needs like the data layer but also needs small LLMs inside the retrieval pipeline.
We see many teams either feeding everything into the context of the LLM entire knowledge basis because they they weren't able to make retrieval work properly um and we see teams having a very simple pipeline.
I think the winning solution needs to be somewhat in the middle and basically orchestrating LMS to rewrite the question properly, summarize the documents and creating more metadata associated with each of the documents that are indexed. Um, and so that's what we're doing.
Um, and building this this this solution that works really well and almost gets to the precision um, and the accuracy of a large LLM while still being pretty fast um, and pretty optimized.
What's the appetite been like for this product in the enterprise versus new companies that are building new AI products from scratch? It feels like they might be uh just the the the AI agent infrastructure companies.
There's a lot of them and it feels like they're selling to a new crop of companies and that's where the revenue is accelerating most aggressively. But what are you seeing in the market?
Yeah, I think the adoption for products like this usually comes from like bottomup type of approach where developers are experimenting with new approaches and new techniques uh and then larger enterprises um catch up. Um so that's what we've been what we've been seeing.
Um in terms of experimenting with models, I think large enterprises also do that pretty easily. Um so for things like the reranker that we just released um there's also appetite from larger companies in integrating that into their current systems.
Is there a uh is there a case study that you are have your eye on amongst the big tech companies like like we think that our software could improve Netflix or YouTube recommendations or something like if the deals could just magically happen where's the lowest hanging fruit like for me you know if I could do anything in I would just get whisper into Siri and so when I dictate a text message it's just perfect and it's much better than what they're currently using um what's on your wish list for you know consumer tech company or or big tech company that everyone knows and they're not taking advantage of something like this?
Honestly, for me it's it's Slack. Um I always struggle um you know I I can never find anything on Slack. Um and um something that we've been doing is annotating our own conversations like appending keywords to our own threads to be able to find information.
But uh we have a lot of our internal research and a lot of things going on on Slack and we find it pretty difficult um to find the right stuff.
So I think companies like that could benefit and it would provide a much better user experience if you could, you know, just magically find all of the information that you have in there.
Yeah, I've been noticing that with Gmail, like the the amount of email has just grown so much and the and the and the amount of text in each email has grown because of all the trackers and cookies and stuff behind the scenes. And so when I search for something, it just pulls up completely random emails like every time.
And it doesn't understand the hierarchy of in an email, I care a lot more about what's in the subject line than what's in the footer.
And so if I'm searching for, you know, artificial intelligence or something and someone has that in their footer that, hey, I run an artificial intelligence company, that's not what I'm looking for. I'm I'm looking for the thread that I was talking to somebody, a close friend about AI and I want to pull that up first.
Yeah. I think that's also why, you know, basic semantic search is just not enough because it basically will pull all of the similar information but not the most relevant or the most helpful. Keyword search is is the same. It's not very smart.
Um, and I think it's just it's just such a waste because there's a lot of information that you could have access to and it would make your work so much faster and you're just spending time like rewriting your question and trying to make the system understand what actually you're looking for.
So I think that you know the query side of you know the user intent um query re rewriting is also super important. Yeah. IM message search absolute disaster. It is absolute disaster. It's like I I know I'm in a text message with Jordy and someone else and so pull that up and it's like here's six. It never works.
Um I also noticed fix it all. There's the there's going to need to be a shift in the way people search.
I remember hearing the story about Google where there was some Google engineer who was running a test on like uh you know how many it was like how what's the world record for you know the the the marathon or something like that and they were they were using the typical keyword boolean search and they weren't getting good results and then they sent it to a user and the user just asked the question in natural language and it just hit it the first time and so I feel like people still I at least I have been a you know a email user for a long time when I go to my email search I often I'm searching in like the keyword world instead of just natural language.
But Google has I mean they're they're experimenting with the AI search thing. They have a 50 like 50word limit right now. You can't just type a whole prompt in in Google search. Like they need to kind of reimagine what that search box is.
And then there also needs to be a a consumer change in how and how consumers interact with that particular uh like UI element basically. Um but thank you so much for stopping by. Congratulations. Good luck to you. We will talk to you soon. Have a good one.
Up next, we have Elliot Hersburg from Amplify Partners coming in the studio. They are in Data Dog, Chain Guard, Runway. I love Data Dog. That's maybe the Golden