OpenAI launches GPT Image 2 with thinking, tool calls, and near-perfect text rendering
Key Points
- OpenAI's GPT Image 2 adds reasoning that lets the model plan multi-step image generation tasks like pulling web data and verifying layout coherence before output.
- Text rendering precision eliminates a core weakness of prior image models, enabling users to generate photorealistic infographics with correct fonts that replace traditional slide decks.
- Tool calling integrates real-world data into generated images, letting the model fetch canonical photos via web search to solve the uncanny valley problem of AI-generated faces.
Summary
OpenAI's new image generation model, GPT Image 2, ships with three significant technical upgrades: reasoning capabilities that let the model think through complex prompts before generating, tool calls that enable web search and QR code generation within images, and text rendering so precise it passes as photorealistic.
The reasoning layer matters most for multi-step tasks. Users can ask the model to generate an infographic from a Wikipedia article, and the model will search for canonical images, verify layout coherence across multiple outputs (like a 10-image Instagram carousel), and check its work before final output. One example shown during the launch had the model generate social media reactions to the earlier "duct tape" beta, pull quotes from threads and Reddit, and embed a working QR code to ChatGPT—all in a single image.
Text fidelity as a differentiator
The text rendering is clean enough that it removes a major weakness of prior image models. Users have generated detailed infographics about historical figures with correct brand fonts and aesthetics, created Dark Souls-style boss fight images of U.S. presidents with flawless text labels, and produced mock Google Street View screenshots and Grand Theft Auto loading screens that look indistinguishable from real interfaces. One test prompt—"make an advertisement for the M4 Pro Mac Mini"—produced a one-shot ad that required no editing.
This precision unlocks a workflow shift: instead of manually designing slides or infographics, users feed Wikipedia articles or prompts into the model and get dense, zoomable output that can replace traditional slide decks. The emerging pattern is compression—sending a single screenshot infographic instead of clicking through a presentation.
Tool use as workflow integration
The tool calling feature lets users anchor generated images to real-world data. When generating an infographic about John Turris and his Apple career, the model can pull his canonical headshot via web search and composite it into the layout. This hybrid approach—AI-generated context with real reference images—solves the uncanny valley problem where AI-generated faces look "sort of like" the target but never exact.
The launch examples suggest rapid normalization of the technology. Users are already experimenting with presidents as Elden Ring bosses, fictional movie casting with real actor headshots, and personal photos turned into manga sequences that maintain likeness across multiple frames. Blake Robbins noted that the world is now ready for this model, with users creating convincing mock screenshots of livestreams and placing themselves into them.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.