Reducto raises $75M Series B led by a16z to bring human-level accuracy to document parsing

Oct 15, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

of Photoshop Mix PS, two apps that I created endless memes in. Anyway, without further ado, we have our next guest from Reducto. Welcome to the stream. Sorry to keep you waiting for the rant. Uh congratulations on the news. Please kick us off with an introduction on yourself and the company.

Yeah, so I'm Ad Abraham, co-founder and CEO here at Reductto. We help people get data out of all of their most complex documents and spreadsheets for all sorts of LM use cases. Uh so plenty of thoughts here on Adobe as well that we can go into. Fantastic.

I like that when you're in the reream waiting room, you can kind of hear me ranting before you before you hop on. Uh give us the news. Give us the news. Give me that hammer, too. News news. What do you got? What you got? Yeah. So, yesterday we announced that we raised a $75 million series B.

This is five months after we announced our series A for benchmark. And million in total funding. Congratulations. On an absolute uh on an absolute tear. I missed who who led the new round. I missed that part. Andre. Andre. Congratulations. Um so I mean we got to go a lot deeper on the actual product.

Who's the key customer? What are the key use cases? Obviously, you know, understanding documents broadly like there's a lot of products for that. So, what was your landing zone? What was your go to market? What's the target customer like? Like like take me one level. Yeah.

How old were you when you realized you wanted to turn documents into data? I guess from day one. Day one. Day one in the Yeah. No, I think look, the thing is like PDF processing is of course not a new thing. People have worked on it probably longer than I've been alive.

I've met the people that worked on the original drivers for printers to print PDFs. Okay. Um, but the context here is very different, right? Like we're not far from a point where language models are going to reason on pretty much any points of human data.

Like before you say hi to your doctor, there's going to be some sort of summary of your medical records. When you apply for a loan, there's going to be, you know, some agents that goes through and understands all of your financial statements.

And if you're going to have all of those human points of data uh with real human processes, you need human level accuracy. And that's what we came in with. So 2 years ago we released what at that time was the first parsing API that used VLMs.

Um and tried to really set a new standard for what accuracy could look like on anything. Um we looked at it as this problem of how can you read documents the way that a human would.

Uh and since then obviously the product's gotten a lot broader but today a lot of the newer AI companies like in the Harveys of the world, the Rogos, the Loras and so on people building really great AI applications use Reductive for that sort of ingestion.

um but also some of the largest enterprises in the world like Fortune 10 enterprises in tech uh some of the largest hedge funds in the world, private equity firms, insurance companies and so on. Talk about the decision to use VLMs. What are those? Break that down.

Yeah, so for a while you would have some sort of you know IDP vendor that is using traditional OCR. Uh but you can really go a step further. There's a lot of things that weren't possible 2 years ago.

Like if you try to read East Asian languages, there are just so many more characters and so it would be really hard to get every little mark correct. Uh if you're dealing with something like healthcare documents, you have all sorts of handwriting. You have checkboxes.

You know, you have a doctor annotating it on the side and they just assume that somebody would be reading that. But when you read that programmatically, you make a lot of mistakes. And so what we do is we're really good at the traditional CD side. A lot of our team is uh former self-driving car researchers.

So we tried to apply a lot of frontier techniques there, but also we have this agent flow where I almost liken it to a human loop. We're making corrections to the last mile of mistakes.

Like maybe we made a mistake on a period versus a comma or a zero was mistaken as an O and we'll iteratively go through correct all of those so that no matter what you're uploading, you can get to good reliable outputs.

Thank you for waiting until we had self-driving cars to hire all the self-driving car engineers to work on turning PDFs into structured data. Thank you. It's really very differently important. Yeah. Yeah. No, it is uh uh I mean what are people demanding on like the output side?

Is it just like dump a bunch of JSON files in S3 buckets? It's like what?

Like how how far do you go from like okay I have a I have a stack of PDFs or maybe papers and you're going to scan and and turn that into text and data but like are people actually coming to you and saying like no I want it loaded into this relational database or I want it in MongoDB like how far do you go?

We see all of those. Um so in some cases these are real time applications like a user is uploading a file and they want to be able to talk to it. Um and so there like we're actually just extracting the data and making it easier for models to reason on.

But we also have cases where one of the biggest hedge funds in the world wanted to digitize two decades worth of historical data. Um this is things that an analyst would have combed through you know extracted individual data points. Um and they want signals off of that.

Um and so in their case we can actually structure it into the exact representation that they want. Especially recently we started going beyond just the initial parsing and extracting like we take care of everything.

like you want to split your documents, you want to classify them, you might even want to edit data in uh a lot of human work is you get information from some set of documents and you put it into some net new PDF or spreadsheets.

Uh people can do that end to end and we offer different endpoints for them to be able to do that. How hard is it to get from 85% accuracy to 90 to 95 to 99 to 99. 99999?

I imagine but I imagine it's like critically important if people are working on you know legal cases or you have medical records that you can't really mess these things up. Yeah. I mean, the long tail is really long.

And yeah, when you're looking at like a financial document, uh, a period versus a comma is not, you know, like a, oops, it's millions of dollars. Um, like you've just changed the order of magnitude. Um, you can't have something like a a patient's medical record and have it say, is the patient vaccinated?

And guess at what the checkbox says, right? Like, you need to get these things right. Um, and that's why I think even for that last mile difference, people are really excited about reductive unlocks for them because there are use cases where they didn't think they could digitize it before.

Um, like it just wasn't reliable enough to put in production. And we're trying to not just get really good initial extracted results. Um, and not just get, you know, that layer of redundancy to correct those um, in cases we might have messed up, but also give them all of the last mile things that they would need.

like if they want to have a human in the loop. We're really good at citing where we got answers from. We're really good at catching our own mistakes and pointing to things that we're not sure that we can extract. Um so they can feel confidence that in production they're not going to have issues.

Do you actually have a hot take about Adobe? Um I am getting dinner with the CEO tonight actually. I'll play back your clip because the clip where he says it's down 35% it should be down 100.

That was maybe hyperbole for the for the comedic effect, but I do have another axe to grind with them because every day I make a PDF and every day I open Adobe Acrobat to make that PDF and there are seven different popups asking me to use their AI assistant that just says, "Do you want to summarize this document?

Do you want to summarize this document? " And I never want to summarize the document. And I always have to say, "No, close this. " It says, "Ask AI assistant up here, AI assistant over there. " And it's too much. And it does.

It should be more intelligent to know that I don't need a document summary for my workflow because I do the same thing every single day. It's unacceptable. Yeah. What? Yeah. I I think people like aren't really looking at PDFs in and of themselves. Like it's not the file format that you care about. It's what's in there.

Like you want to be able to do the things that you want to do. Yeah. Um and hopefully we end up in a place where people can just interact with the underlying data. Um that's what we want to help with. I'd love that. Well, thank you for helping make data more interactable.

Thank you for everything that you do and thank you for joining the show. Yeah, great to get the update. I'm sure you'll be back on very soon. We'll talk to you soon. Have a good rest of your day. Linear. Linear is a purpose-built tool for planning and building products. Meet the system for modern software development.

Streamline issues,

← Back to story