Extend AI extracts structured data from messy documents — healthcare, finance, and logistics are paying customers

Jul 3, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Absolutely. Thank you for having me. Have a good one. Bye. Up next, we have Kushall from Extend App coming into the studio. We got some gongw worthy news, baby. Oh yeah. Oh yeah. It's gong time. It's gong time. It's gong time. Jordy, I'll let you. What's up, guys? How are you? Good to see you. Take us off.

How much have you raised? Good to see you, too. What's up, guys? Give us the news on the fundraising front first and then we'll figure out what you do for a living. I just answer customer support for a living. That's what I do. Well, you got to get on the founder by intercom. Yeah. Yeah. Finn.

Then you can just go to France, you know. There we go. What's going on? What's the news? What's up, guys? Great to meet you. Uh, well, we recently announced $17 million in funding across our seed and series A rounds. Congratulations. Thank you. Thank you. With authority. That one felt good.

Okay, now to the less important news. What does the company do? Well, we also filmed a pretty kick-ass launch video to go along with it, too. I saw that. It was a ton of fun. Yeah, it was great. So, is this like some type of new like fax machine company? You're doing document processing. What What are you building?

Yeah. Yeah. So, my name is Kushaw. I'm the CEO co-founder of Extend. And at Extend, we're building a modern document processing cloud. So, you know, we work with companies that just get a ton of PDFs and unstructured documents, right? uh and we help them ingest and process that.

So we really work with, you know, cutting edge AI teams, data teams, technical teams, whether they're trying to build AI agents on top of that data, new products with that data, automate, you know, manual back office work with that. Uh we really give them all of the infrastructure and tooling they need to go after that.

Is the actual physical document in problem solved? Is is Iron Mountain just dominant there? Is there no need for actual new scanning? Have we scanned every document in the world by now? Not even close. It's actually funny. One of the first customers I talked to back when I was getting the company started off the ground.

It was a uh a large healthcare provider in Milwaukee, believe it or not, in the the middle of the country. Uh and I talked to them, I'm trying to pitch them, you know, LLM for document processing, and they're like, "This sounds great.

Come back to us in 2 years once we, you know, have digitized this mountain of, you know, lockbox warehouse of PDFs and boxes that are stored in our backyard. " Um, so not quite there yet, but I think we'll get there pretty soon. Talk to me about the tech stack.

Is OCR, traditional OCR pipelines, are those like dead at this point? I've heard about some people that will, you know, send a PDF to uh Google for OCR, get a big messy bucket of text, but at least it's in text format, and then run that through an LLM to just kind of clean it up. What's the state-of-the-art? Yeah.

You know what's really surprising is so much of the world still runs on these PDFs and Excel documents and they just end up being the system of record for such mission critical data. If you think healthcare, you know, finance, supply chain, it's just all powered by PDFs.

I think people's mind would be blown if they knew how much of the global economy just runs off of PDF files. Um, and the problem's been around for a long time, right? I think OCR at this point is uh decades old.

Um, you know, funnily enough, we we have a bunch of hoodies for the team and we have the OCR patent printed on the hoodies just as a little testament to it, but it's like 40 50 years old at this point, right? Um, but it's really accelerating now.

You know, I think more documents are going to get processed in the next 6 months than all of history combined since the PDF file format was invented. Um, and I experienced this myself. You know, I actually built this back at Brex in 2018. I was an early engineer there.

Uh, and we did exactly what you were saying, OCR a document and then just run it through a ton of painful reax to try to like pull out these targeted data fields and it was so painful.

Um, and fast forward now to today with LLMs and the capabilities are just so much higher and the demand for document processing is through the roof right now. Right. Yeah. I was going to say what what what talk about kind of the demand. I I I know you got a bunch of uh pretty incredible customers already.

Yeah, I mean we have the privilege of working with some awesome folks from large enterprises like Square uh you know that are trying to do a ton of use cases around uh financial statement ingestion or Flat Iron Health who have billions of pages of patient medical data um all the way to cutting edge startups like you know Brex and Mercury and Checker and many others that are all using us in production.

Um and we have incredibly high retention ton of folks that kind of refer us to others and come inbound. um and we're growing the team to to keep up with it, but the demand has been pretty intense.

And I think what we've kind of seen is that um as these models have gotten better, it's sort of increased the possibility of what you can go and do.

But for very missionritical use cases, uh where even you know 99% accuracy for example is not good enough if you're trying to do mortgage processing or patient healthcare intake because the downstream impact of bad data is so catastrophic, right?

Um and so we really focus on these you know pipelines and industries where you just cannot get it wrong accuracy reliability you cannot get it wrong and that con that you know constraint puts a ton of complexity on the problem and that's kind of where we go deep and that's why we have a number of customers that kind of try to do the model approach and then come back to us in 3 months 6 months uh and they sign up for extend and pay for extend uh despite the models having gotten exponentially better over the past couple of months.

So yeah, I mean what the models are getting better, but what models are you actually using and for what use cases? Because I imagine that the price difference and the time cost of using a heavyduty reasoning model versus a really quick optimized like, you know, 40 level model has got to be pretty different.

And then is tool use at all in the mix with some of these?

because I can imagine as clunky as reax is every once in a while you might want to have a layer of reax validation on top of things right just to make sure I think uh you're getting at a really key point which is there's very few use cases which is like here's a document extract this data because for any mission critical pipeline you have the entire endto-end system you have to build of classify the document because that changes the schema and then split the document up and then actually parse it to make sure you can handle these tables that span 10 pages Right?

And each one of those kind of adds complexity to the axes of cost and performance and latency. And you kind of need to have all three of those, right? Not a single model is going to be great at all of them. And really why customers end up buying us is because they don't want to be locked in to one model.

And we give them all the tooling to figure out, hey, this initial triage step, we want to use a really cheap, fast, you know, low latency triaging classifier, but then for this really complex table, we do want to use a really powerful reasoning model.

uh and we kind of help them integrate build and build all of that end to end. Talk about where the data is winding up. Are you seeing more clients demand that you map the documents that you're ingesting all the way to a relational database like Postgress or something?

Are people just saying hey just give me a blob of JSON I'll stuff it in MongoDB. Do people want a CSV file that's on AWS or S3 or something like how do people want the data finalized from you? Yeah, few I mean the flexibility is kind of the name of the game and that's why we provide options for all of that.

Uh some people that are trying to build you know AI agents on top of it. They'll want the JSON payload but then also put it into a vector DB, right? Okay. Uh some other folks are going to want to actually index that in like elastic search if you're trying to do text experiences on top of it.

Um other people that are like just automating mortgage processing for example that'll just go like structured data points into a DB that then they can take action on.

Uh, so we kind of give them the data, the raw data, and then get out of their way so they can figure out, you know, where to do and what to do with it downstream. Yeah. Uh, how how real are I mean, there's there was this narrative for a while like, oh, enjoy the VC subsidized era. It's like the early Uber days.

Uh, all the LLMs are cheap, but they're going to be really expensive. Uh, how real is it that LLM inference can be a material cost for a company like you that's kind of reselling inference? Not to degrade what you're doing, but like that that's a piece of your business.

you have a cost uh that's inference and then you're earning revenue from the value that you add on top of that uh versus a startup that's buying from you. How how big are these line items?

Are we talking uh you know something that's material to SGNA for a company or or is is intelligence like basically too cheap to meter at this point? Yeah, it's a great question and it really depends on the use case, right?

And um our contract sizes range from, you know, we work with small startups paying us a few hundred bucks a month all the way up to large enterprises that pay us six plus figures a year, right? Um and it kind of varies all across the board.

Uh but, you know, we haven't played the unit margin game where we're trying to, you know, give away a dollar for 90 cents or anything like that. We have positive unit economics on on all across the board.

Um because really what we benchmark ourselves against is what is the cost of bad accuracy in your downstream product or your user experience? What's the cost of six months of edge time?

And at the end of the day th those end up outweighing a lot more than you know potential percentage differences on like the LLM costs themselves, especially as they've continued to come down.

You know, there were some things we couldn't do from a product perspective 18 months ago that now we really can just because it's so much easier to make multiple LLM calls and do things more agentically. uh in an endto-end system. Very cool. Jordy, anything else? No, very excited for you. Clearly ripping.

I can I can I can sense the voice. It's going well. There's nothing better than than seeing um Yeah, you you know, a lot of people raised a series A very not everybody has this level of of of pull from the market. So, I'm sure you'll be back on for a B in no time. It's fantastic. Can't wait. We'll talk to you soon.

Thanks. Great. I feel like that could be very valuable for you. You could maybe use extend to take ingest all the registrations from the G63s from the G Wagons and all the service records and then you could have one database for all of them that you query across.

I was going to say I was going to say I mean we produce a lot of documents and figuring out how to process these bad boys. It's almost like they come from the computer and the internet and they could just uh remain there and yet we print them out because it's fun. Um anyway, what else is what else is in the news?

What else is important to cover today? Uh we have a few few we have one more in your view today. Yeah, we got a vlog Avlock coming on uh in just a little bit. Uh Daniel Daniel growing Daniel was any breaking news?

Uh he says watching the Sohham interview and I can already tell people are going to see this interview of a legendary liar talking about his legendary lies and be like wow I kind of believe him. You can label the tin poison and people will still eat it. Label the tin poison. Yes, you can label the tin poison.

People will still eat it. Um he says it's you had said at some point it's tough because someone who would lie about having multiple jobs might come on here and lie to us. He see quoting you. He seemed sincere. He did seem sincere. It's very possible that he's a a legendary.

He's a 10x engineer and also a 10x liar or 100x liar. Um well the proof is in the pudding, you know. Yeah. You know, it's very obvious that I I wouldn't recommend a portfolio company hire this guy. I think it's possible he goes and works somewhere and just does a great job. Um I wouldn't necessarily bet on it.

Might hire him in a house in your literally in your house. Yeah. But but the push back there is you got to have him in the office, but then you got to live with him. And he but maybe he he just is uh you know, he sets up his desk and he's just still moonlighting.

You know, I think he might be he he might just be built for moonlighting. He also was devop the dev shop thing he's fine. He would he would routinely miss um you know miss deadlines and then use various excuses when the India Pakistan um stuff was happening.

He was telling people you know they shot a drone 10 minutes away uh guilt- tripping people. So it's rough terrible behavior. Uh it's a buyer beware situation for anyone who's employing him. That's right. You know what you're getting yourself into.

You got to be careful and make sure that that aligns and ultimately the value of what you're paying. What's the value he's delivering and you just know, okay, uh could be could be some volatility in that. But he's more of a known quantity now and we learned a couple things at least got him on the record.

So, you know, if he goes out and moonlights again, everyone will be able to point to that interview. Be like he said he wasn't going to do it. and he did it. So, somebody was saying uh a fake SOHOM fake SOHO account was saying Elon can work at nine startups.

Jamie Diamond can be on the board of multiple companies, but when you do it, it's a scam. And then Austin says, "So tired of this argument. Working at multiple companies was not the problem. Many people have side gigs or contract at many places. They just don't lie about doing so. " So, that's the key thing.

Jaime Diamond is not sneak sneaking on to other boards. Elon's very public about his different companies. And uh it's all about just avoid lying. Just don't lie. That's what we said. Uh Sohan probably could have had a had a cracked dev shop, but he chose to lie. So hopefully he course corrects. Oh well, we will see.

We will let the timeline decide. We will let I'm sure the timeline will not let up on the memes and the jokes and anything else that's going on. Um oh well. Should we uh should we go through some F1 stuff?

So F1 has made an annual sponsorship revenue of more than 636 million which I believe is is a significant multiple to the amount of money that they were asking for um for actual TVs TV rights.

I think they wanted 200 million ESPN million for for the right to to distribute that and show that as content, but the ads inside the well beyond that. Yes. So between 2019 and 2024, Liberty Media says F1 series annual sponsorship revenue more than doubled to 636 million.

At a team level, total sponsorship revenue increased by 60% to 1. 3 billion over the same period. So, Liberty Media is selling F1 sponsorships, but then the teams are also selling sponsorships.

Uh, the sport is bigger than it has ever been in its 75 year history, and everyone wants a slice, but how can sponsors stand out and justify the cost when demand is so high?

Yeah, I I they haven't announced their new media rights or TV rights deal, but given how much they're doing on the sponsorship side, I wouldn't be surprised if they just go for max eyeballs and kind of take it um at at somewhat of a not necessarily a loss, but just understand it won't be the profit center that it maybe once was.

Because if you actually look at F1 viewership, that was TV rights in the United States. They wanted 200 million. Uh, no. ESPN wasn't interested in doing that. Yeah, I guess this is global. Previously, previously in in 20 2023 onward, they were paying between 75 and 90 a year.

And um they just don't command the live US viewership uh to to really justify that anymore, I guess, or or even beyond that despite the sports popularity. Yeah, it's interesting. Um, apparently brands are picking teams based on like the unique personality of those teams now.

So we give this example team the team's distinct personalities can be a big draw for sponsors. Heineken began sponsoring Red Bull in 2017 just a year after the Dutch Brewer started its long-standing series level partnership with F1.

The deal with the team is effective because both are nonpremium brands in a sport dominated by luxury car manufacturers, says Hessie, who has previously worked for both. Looking up and down the grid at the time, they wouldn't have sat anywhere else. And so there's the official like drinks car of F1.

And uh if you're selling alcohol or energy drinks, you head over to F1 Red Bull Plus. Uh what was it? There's the steak. Stake has a Oh, yeah. You love the collaboration. Uh, it's like it's like insert. It's the Saur Stake F1

← Back to story