Interview

Morphik: Open-source multimodal search for AI agents reaches $150M-valued Glean-adjacent market

Jun 11, 2025 with Rohit

Key Points

  • Morphik positions itself as infrastructure beneath enterprise search platforms like Glean, targeting vertical AI application builders that need multimodal document retrieval rather than end-user enterprises.
  • The startup treats document pages as images for search rather than relying on OCR parsing, delivering higher accuracy on dense visual content like patent diagrams and medical charts.
  • Morphik's open-source repo has 2,600 GitHub stars and 4,000 monthly downloads with 200 active production deployments; proprietary features like SSO and connectors remain behind a paywall.
Morphik: Open-source multimodal search for AI agents reaches $150M-valued Glean-adjacent market

Summary

Morphik is a YC-backed startup building open-source multimodal search infrastructure for AI agents. The founding team includes two brothers, one a former MongoDB software engineer and the other a Cornell dropout, who entered YC with a broader "data layer" ambition before narrowing to multimodal search.

Morphik solves a core technical problem in document search. When images, charts, tables, and technical diagrams are embedded in files, OCR-based parsing breaks down on complex PDFs. Morphik treats pages directly as images and retrieves over those visual representations, delivering materially higher accuracy according to the founders. Early traction comes from legal tech and health tech, where documents contain dense visual content like patent diagrams and medical charts.

Positioning vs. Glean

Glean raised $150M at a $7.2B valuation the day before this conversation. Morphik positions itself not as a competitor but as infrastructure that sits beneath Glean and similar products. The pitch is API-first and developer-facing. Teams building vertical AI applications such as "Cursor for X" that need to handle multimodal documents are the core customer, not enterprises buying an off-the-shelf search product.

The longer ambition draws on why coding agents work so well. Code is low-entropy, meaning future states are highly predictable, which lets models like Cursor reason far ahead. Morphik aims to create that same low-entropy property for multimodal workflows. A video editing pipeline, for instance, has sequences where cuts imply predictable next actions. That would make multimodal data actionable for agents the way structured code is today.

Open-source model

Monetization is drawn at the individual versus team level. Features that benefit a single developer go into the open-source repo. Features that serve teams such as SSO, connectors like Google Drive, and query performance at scale stay proprietary. The project has 2,600 GitHub stars, 4,000 monthly downloads, and 200 active production deployments.

Fundraising is close to complete but not yet finished.