Hebbia founder processed all 80,000 JFK files overnight — here's what the AI found

Mar 19, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

uh in Venture and I think we have George joining the show right now George how you doing I'm good I'm good is that a is that a white claw or a Celsius on your desk this is a Celsius oh very good very fantastic P PG-13 I love it are you are are you a white claw in the middle of the day guy uh no I'm I'm I'm just a third cup of coffee guy okay so yeah uh where do you Max where do you max out where do where do you hit your your sort of zone of genius with caffeine is it in the the 300 milligram range higher lower you know I think uh the the unfortunate thing with caffeine is uh I'm probably like half a cup of coffee and then it's just less productive I I think I think I think it actually just distracts you slightly you need to be like so you got to go so you go for three because so you're just so good at what you do that you got to make it a little bit harder to have fun is that is that it it's when when you're when you're jet lagged or delayed you then you then you've got a kind of discount or or you need to rely on Lucy or or you know all the different John cougan things but um today we're on three cups of coffee okay oh you're on three cups of coffee is that because you're diving into the JFK files can you break us down uh what have you found so far take us through the story that is uh that is actually why I I'm am on three cups of cups of coffee I was I was up late responding to to different uh you know conspirators on the internet but yeah last night uh as as many folks know there was a dump of 80,000 formerly unreleased pages with no redactions uh to the JFK files uh for the Trump presidency uh and obviously no one has time to read through 80k pages of like scribbles and scratch and and and a bunch of of scann documents from the 70s and uh we uploaded them to our system that usually looks at virtual data rooms and a bunch of of Financial and legal documents and instead just toss them all in there and like instantaneously had access to an AI that could look over them um a lot of interesting things that that came up some of which I can talk about on this on this podcast or or or I guess live live live show um and some of which I'm I'm I'm probably less keen on there's definitely like a variety of different mentions of multiple Shooters um there were a few different mentions that people thought were of UFOs but in reality it was OCR getting stuff wrong so there's nothing too interesting on on UFOs um but then there's also and I'm trying to pull back into some of the old questions that I was being asked uh a variety of different mentions of folks like uh I believe Jay Garrett Underhill who was interestingly enough um you know someone that a lot of conspiracy theorists believe was assassinated uh by the CIA for going in claiming that it was a CIA um inside job uh the a very interesting piece of the entire you know uh dump is that you might think like if you were to guess who are the most mentioned people you know you might me you might guess that it's like you know Lee Harvey oswal and and JFK and a VAR of other people but it goes like in if you extract all the entities from every document JFK is first John Kennedy is second so you know just double mentions of JFK then Central Intelligence is the third most mentioned you know entity in the documents then Fidel Castro so so there's there's there's I believe if you actually pull up 18 documents that are thoroughly about Fidel Castro wow and then the fourth entity is FBI director and FBI agency so as you kind of go down the list you can actually see like lots of mentions of Cuba lots of mentions of the CIA and the FBI um but but not as much on the on the the other standard people that you know folks would usually go and look into a lot of documents also mentioned Martin Luther King Jr do you think uh applications of of your guys's tech and and more broadly some of these models can help us just get a better understanding of History right for every big story like you know the JFK assassination there's just like this incredible amount of evidence some of which is real some of which is not uh they're all from different sources and it's almost uh if you have one human that sort of dedicates their life to the topic they can sort of figure out the truth behind it is it possible that that uh this is obviously not what uh you know what what heia you guys are working on right you're sort of uh f focus on the Enterprise but uh do you see a world where we just get a better understanding of these sort of major historical events by kind of hacker historians uh leveraging you know what you guys are doing here with with the chat with the JFK files product yeah I think that ultimately uh a lot of history is subjective and the history books are written in a way that like presents always one side of the story um and as you see a shift away from from what I would call like syntheses right like history books towards people using first party knowledge and then orchestrating uh effectively an AI over whatever they'd like or whatever question they'd like you'll actually start to see people change uh and the way that they work and the way that they understand history the way they understand historical events change and just like social media allows you to basically validate your own beliefs I actually think AI can be used as a tool to present like any side of an argument whether good or bad uh and that just means that there's less uh maybe social control that traditional Media news outlets and Publishers of books would have over like the cultural Z and how people refer to historical events do you think there's uh I mean now that people are so aware of documents will be embedded in llm systems and this is how these large troves are processed there's some sort of metagaming going on where you know if I wanted to implicate you in the JFK files I could have released 80,000 files from JFK and then added another 200,000 files just about George and all of a sudden it's going to look people are gonna oh well why is George mentioned so much and you know they don't really realize that these are completely unrelated but by by contextual contextualizing them as like one block all of a sudden I'm kind of like data poisoning it is that do you think that's going to happen in the future yeah so so interestingly um if you look at if you look at things like rag models so traditional like retrieval augmented generation uh and you ask questions like who are the people mentioned you can data poison it one of the things that you're seeing with a lot of deep research products like heia and so this is just like a differentiator between heia and trategy BT Enterprise or between heia and and even like the perplexity spaces product is that instead of like going in pcking a search engine heia goes in reads every single file and tells you whether or not they're related or not so it's not search engine it's more infinite effective context window so the AI can process a lot more information and and using that uh it's it's better about you know having slightly less bias or like having the data skewed in like a in a way that um you know a search engine might speaking of context windows I know that there's been like a little bit of an arms race Google I think has like the biggest one at like a million tokens uh are you is that important lever to pull when you're building your product in your system like is that a important optimization vector or is it just let's find the smartest model or the cheapest model or maybe cost doesn't even matter because of your clients what are you optimizing for um ultimately we're trying to build the product that an AGI with infinite resources and infinite intelligence with use MH we're not trying to build the AGI or the model or like the longer context window we're trying to build an agent orchestration platform where you wouldn't judge uh you know a a artificial super intelligence by its ability to use an abacus or to like to do computations in its own context window you would judge it by its ability to use a tool like Excel to do structured data processing uh similarly we think that like the best way to do unstructured data processing won't even even be in a larger effective context when for one model but rather ways for it to orchestrate sub models to do computation on every file and so if you look at what's going on with the JFK files you're actually having one God model orchestrate a bunch of other smaller models and read with full attention over every document but not you know you you know have to jam that all into one infinitely long context window it's an infinite effective context window was there any like pre-processing that you had to do with the files because I imagine a lot of these are like screenshots or like photos of like pretty messy scans of documents like it's not exactly like a Json file uh a ton of of different processing one of the the worst parts of like running an AI company like processing old documents is that the documents like aren't are messy and like ideally you'd want like a multimodal model that could process pictures of every document and do that more expensive even slower there's you know it's there's a context window issue there yeah uh so what we do is we we have probably the highest throughput indexing and like a highest accurate accuracy index ing engine that will actually use multimodal models for really hard documents sometimes use like standard OCR do like cable and layout detection and all this other stuff to feed it into something that llms can like process in an easy way can you talk a little bit about the history of the company because you started like before the AI boom and I remember you describing the company early on and I was like ah that sounds kind of cool and then like I saw you again like a couple years later I like oh this guy's working on like the hottest thing in the world so uh how did you s like what's the prehistory of the company uh prehistory is uh I think I I made an early bet on large language models and and it paid off um I was in a PhD at Stanford uh and I was working on honestly just studying metal learning because I thought it would be a really important piece of technology and as kind of like summer of 2020 unfolded uh open AI destroyed all of our research they released a paper on gpt3 which was hey large language models are meta Learners and it took like the academic world that was trying to build aan we like okay you're never going to do it it's going to happen uh in a close s way and and open AI will probably be the people that that invented and so I think being close to that shift and being close to the realization that this technology was going to be important early on was like okay I should probably leave my PhD and this will be the most important product and technology that I can build and then spent the next two years like in the desert searching for customers that would pay for anything but building the best possible Enterprise product that I thought future agents that I thought future people working alongside llms would want uh and that just set us up to strike in a really fast big and Market grabbing way when the iron was hot that's awesome I think oh please you're you're focused on the Enterprise but I'm I'm just interested to get your opinion here because we we talked about it uh yesterday with Scott from cognition do you feel like we're close to have you have you seen any teams get that you're excited about around like building a truly great consumer AI agent we've sort of been promised for a long time this like workflow of like Hey we're going to book a flight it'll book your hotel it'll do this and and we just haven't there's no dominant product that's doing that today I'm curious just to get your opinion there do you feel like we're close to that super super magic moment other than the sort of chat gbt sort of conversational chat search people talk more about like super intelligence timelines than they do about just like AI booking me a flight yeah I want to know I want to know your flight booking agent timeline yeah that's what I want to know it's definitely more near term than super than paper Cliff yeah yeah um paper Cliffs are are your airplanes I would say um useful agents in consumer almost assuredly will be solved by the big model providers and so it's almost like the worst space for a startup to work in honestly there's a million startups now building B2B AI like trying to compete with us and a bunch of our other competitors are trying to build it's like it's already too late like if the whole world's trying to do it like you're already way too late like that you have to you have to start way before everyone's trying to build it um but I I think it will be done by open AI this year or sometime in the next year um I don't think it's going to be a startup that doesn't and I think it's just not because it requires Partnerships it requires like honestly API handshakes between agents uh and and and apis of external providers like Delta and United and and I think that'll be that'll be a problem that requires uh a large language model scale company uh a training Scale company to yeah I think that's a good point we have the antibot industrial complex that's just like pres preventing Bots everywhere and so if you're trying to build a consumer AI agent that's just a a rebranded bot it's very hard to do unless you're open Ai and you can go to Door Dash and say look like let's do this deal our users want this we're going to be able to drive you know an a very meaningful amount of traffic so well can you give us an update on kind of uh where the company is where people can find you and uh just kind of wrap it up because we got to move on to the next guy absolutely um the company is heia AI you could just search heav on Google and then click on one of our paid ads to drive up our custom acquisition cost and it's hea.

com now I noticed you got the there was someone squatting on that for like you know uh the longest period of time and I was like they they know exactly who I am they're just going to charge me a crazy of money for it um but yeah you can find us on on he AI or he.

com and I'm George saula on Twitter at gulka and we should be releasing as as the Trump Administration uh releases more files a variety of other interesting like AI deep Dives so give follow and big fan of this uh of this show thanks for thanks for calling in yeah it's F fantastic way to just demonstrate the power of the product and and thank you for your public service appreciate talk soon bye citizen techno journalist citizen techno journalist uh my two favorite uh Wild Card JFK conspiracies uh the driver was responsible people think that there's this conspiracy theory you want grab the Hat John yeah yeah the hat so people think that the driver was responsible because uh there was like an there's a burst of acceleration right at the wrong time and it shows that like maybe he was in on it like slowing the car down to make the shot easier and then the really crazy conspiracy it was a suicide by JFK he wanted to like send a message to someone and so he was like I'm going to die for this so he got Str I mean I guess but there's an unlimited amount of conspiracies about the crazy thing is is I do believe there was there was truly many many documentary released all at

← Back to story