Interview

Extend AI extracts structured data from messy documents — healthcare, finance, and logistics are paying customers

Jul 3, 2025 with Kushal Byatnal

Key Points

  • Extend AI raises $17 million to scale document processing for healthcare, finance, and logistics, where extraction accuracy failures carry high downstream costs in mortgage approvals and patient care.
  • Named customers including Square, Flat Iron Health, Brex, Mercury, and Checkr pay six figures annually for Extend's model-agnostic orchestration that routes documents to different LLMs based on cost and accuracy requirements.
  • Customers attempting direct frontier model pipelines hit reliability ceilings within three to six months and return to Extend, creating a repeating acquisition cycle that drives growth toward Series B.
Extend AI extracts structured data from messy documents — healthcare, finance, and logistics are paying customers

Summary

Extend AI has raised $17 million across its seed and Series A rounds, using the capital to scale a document processing infrastructure platform that converts unstructured PDFs, Excel files, and scanned documents into structured, actionable data.

The company, founded by Kushaw (CEO), targets industries where data accuracy is non-negotiable. Healthcare, financial services, and supply chain are the core verticals, sectors where, as Kushaw notes, even 99% extraction accuracy is insufficient given the downstream consequences of bad data in processes like mortgage approvals or patient intake.

Customer traction spans enterprise and growth-stage fintech. Named customers include Square (financial statement ingestion), Flat Iron Health (billions of pages of patient records), Brex, Mercury, and Checkr. Contract values range from a few hundred dollars per month for smaller startups to six figures annually for large enterprises. Kushaw reports positive unit economics across all tiers, positioning Extend's value against the cost of accuracy failures rather than competing on raw inference pricing.

The core technical proposition is model-agnostic orchestration. Rather than routing every document through a single LLM, Extend builds end-to-end pipelines that classify documents, split them, parse complex multi-page tables, and assign different models to each step based on cost, latency, and accuracy requirements. A fast, cheap classifier handles initial triage; a heavyweight reasoning model handles complex extractions. Customers are buying to avoid vendor lock-in as much as for the accuracy guarantees.

Kushaw built an earlier version of this problem at Brex in 2018, using OCR plus regex pipelines, and describes the LLM-era capability jump as transformative. He claims more documents will be processed in the next six months than in all of history since the PDF format was invented, a figure that reflects both enterprise digitization backlogs and the falling cost of multi-step LLM inference.

Output flexibility is a deliberate product choice. Extend delivers structured data as JSON payloads for agent pipelines, vector database ingestion, Elasticsearch indexing, or direct writes to relational databases, depending on the downstream use case. The company's positioning is to extract and deliver clean data, then step aside.

A recurring customer pattern: teams attempt to build document pipelines directly on frontier models, hit accuracy or reliability ceilings on mission-critical workflows, and return to Extend within three to six months. That cycle, combined with inbound referrals and high retention, is the primary growth engine heading into the Series B.