Consensus reaches 5M users with AI-powered academic search engine that out-indexes Google Scholar on peer-reviewed research

Jul 10, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

helping dumb students. Yeah. I mean the super dumb question, super obvious question is like isn't all this stuff already in chatbt? like like how like uh how are you differentiating? Yeah, you must be doing something because you have five million over five million users.

So yeah, I mean the like one of the best examples to like encapsulate why it's different is the fact that Google Scholar was the first vertical search product that really broke off of Google back 20 years ago. Interesting. Yeah.

Even when they were doing really nothing than just being a dedicated index for research papers. Yeah. Still hundreds of millions of people are going to that every month. So the same thing is is kind of true here. We're dedicated to a use case. We have a dedicated corpus.

We search over we hopefully search over that corpus a lot more intelligently than a general purpose chatbot would. We do things differently in our interface to show you that information. Like we're much more citation forward, right?

Like experience where you can like really interrogate what's been returned in your search using like the chat GPT search. It's pretty much like an afterthought. It's there if you kind of want to dig into it, but it's not really what it's designed for. Yeah.

Everything about it from the way it searches, the way it shows to you, and then the features built on top of it, all dedicated towards academic research. Walk me through some of the key technologies that uh enable better search. I'm I'm thinking about like vector databases.

um even just like stuffing a better index in Reddus or Postgress or doing more indexing on top of these documents doing like you know transformations on the underlying documents to get them into you know more basic formats like what's what what's interesting large context windows there's so much that you could throw at this problem what what's actually working yeah so lots of different things many of the things that you're saying so like number one being dedicated to a document type just helps us it helps us in the way that we can create our embeddings to search It also helps us in that ingestion process kind of like you were saying of like document transformation.

We'll run little tiny LLMs over 200 million papers add new like enriched metadata about them that we can then use in our search ranking and in our filtering.

So think like we'll pull out what is the design of the study or what is the sample size of the study and we use that in search ranking and we use that in search filtering and then on top of that we're like the main intelligence of the search is learn to rank models. So people interact with the product. They save papers.

They site papers. They share papers. We learn from all of those interactions. We learn what matters most. We learn about all the attributes about a paper that matter in search ranking based on how people are interacting with it.

So the simplest way to think of it is like because we only have a certain use case people are using it for, we get to train our search models to try to think and act like a researcher would going through these papers. Jordy, uh how uh what what are the different data sources here?

I'm assuming a lot of the stuff is public. I I know some of the I remember uh you know being in in college trying to find different studies or papers and like hitting pay walls. A lot of them are locked down. There's the famous I imagine that you've done some deals to get access to data.

What is the what is the kind of um That's a great question. The the the the full body of work that's available. Well, hopefully pay walls are are going to be a thing of the past moving forward as open access science gets more and more momentum. uh which we'd love to help help Shepherd in.

Um the way to think of it is there's like three different layers of access levels of access you can have in consensus. So there's one there's fully open access science that's all publicly available. We're able to ingest the full text, show it freely, let you download it. All is well and good.

Then the rest of the bucket is paywalled content, but there's two levels of access we can have within it. So there's the buckets where we have deals with publishers trying to get as many done as possible where we're able to use the full text in our search and in our analysis. We're just not able to display it to a user.

Benefit to the publisher is we're helping them drive traffic, get people to see that. Hopefully this like snippet search ranking is engaging. Then they go into it and drive a purchase. And then there's this third bucket which is we just don't have a deal with the publisher yet. Fully behind the payw wall.

We're using what is publicly available. So that's like the abstract and the metadata of the paper, which goes longer forward than you think. Like the abstract is specifically designed to be this perfect like nice summary of the paper.

It's like can go a pretty far distance in search ranking and even some analysis using abstracts, but obviously nothing nothing more brutal than than being a college student and almost getting like the information that you need from an abstract and realizing like, do I really have to pay like $50 for this?

Like single fact. The craziest is that even if you wrote the paper, you still have to pay for it. Hand it off to a publisher, they publish it. So, I could literally have published this paper and if I come across on the internet, I still have to pay for That's wild.

Uh Jordy had this question earlier about the nature of scientific discovery. Elon Musk at the Gro 4 launch was talking about uh his timeline for discovering new physics is two years now based on the progress that he's seeing at XAI.

Uh and Jordy was making the point that a lot of scientific discoveries come from mapping different disciplines together um or just invent you know inventions invention generally is is apply you know the the mind of a computer scientist to a biology problem or vice versa.

Um are you seeing users do those type of searches? Is is this product useful for that type of scientific discovery?

There's been this like lingering question in artificial intelligence about if you were a person that had read every single research paper, you would probably make yeah you would make discoveries and connections across things and yet that hasn't happened.

Maybe it's some fundamental limitation of LLMs or AI at this moment. But what are you seeing and what's your take on that concept of like cross functional pollination? Yeah. I mean, everything basically that humans have invented new comes from pattern matching across disciplines and like that's how we create new ideas.

Yeah. Yeah. I mean, I'm not an AI researcher, so like I don't have the single most informed take, but also nobody knows what the heck they're talking about in in this world. Um, you know, I think it is probably a fundamental limitation of LLM given what we've seen.

like a measure of I'm going to parrot this from from Francis Sho in his YC talk the other day, but that a measure of intelligence is the efficiency by which you process information and apply it in different domains and that just like isn't what LM are really doing great right now despite the fact that they've processed so much information.

So our take at consensus would be more get people to the edge of what is known and then let them do the inherently human part of science which is create these new insights and new discoveries. Like every science experiment that's ever been done starts with a review of the literature.

Like think about it as like you're getting the foundation of knowledge underneath your feet.

If we can, our goal at consensus would be speed up that part as much as humanly possible and let us do the thing that humans are better at than machines right now, which is that pattern matching, which is that coming up with new ideas.

And if we can make that loop move faster, like heck, that's a freaking valuable and powerful thing. Uh, switching gears completely, I know you were at DraftKings prior to this.

uh on the sort of research and and analytics side, what is your thesis around the um ultimate um collision between uh sort of betting activities and AI last night. Grock announced a partnership with uh Poly Market to try to bring in prediction markets to to try to basically help make uh the model itself smarter.

Uh ho how are the how are the big players like DraftKings even thinking about uh AI? I'm sure a bunch of people have like chat GBT rappers specifically focused on sports betting and things like that. But um how do you think the big players are thinking about it?

Yeah, I mean well I left draft games in 2021 so I can't say that I was there when people were think worrying too much about AI models. And also the natural question I always get is how the heck did you go from sports betting to science?

And the answer is uh my parents and my grandparents and my sister are all teachers and scientists. I was athletic growing up and I loved applying numbers to sports. Um but I I actually have something kind of interesting to say here.

So my job at DraftKings was I was building models to find the professional gamblers on the site. So like you'd look at all previous betting history and demographic data. You try to make predictions on is this person actually have an edge over the market?

Um, and I would have to imagine that with better and smarter and more powerful models, like people's ability to themselves have an edge on the market would increase in the short term and then the markets obviously catch up and figure out how to bake all that in. And I mean that is the beauty of markets, right?

Like whatever technology the people have on the side of betting into a market, so does the provider who is putting up that market and they get the information from the people they know that have the best models.

So, I think it's going to be an interesting cat and mouse game moving forward as it's always been with sports betting. Just instead of, you know, Johnny Two Shoes in New York getting inside information about injuries, now it's somebody with a super powerful AI algorithm that's predicting games above market.

That's fascinating. I I have to imagine that in the in the AI era, the insider knowledge about injuries is is even more valuable. 100%. But but but to your point, you could probably number one way to know if you need to limit somebody is if they are ahead of an injury because it means they're somewhat connective.

They're doing the sophisticated they're on the inside. Yeah, that makes a ton of sense. Insider trading. That's fascinating. I I didn't think about that in the context of uh sports betting. Uh

← Back to story