Hosts react to Nano Banana Pro prompts: RPG maps, Lego renders, and image editing benchmarks vs GPT
Nov 20, 2025
Key Points
- Google's Nano Banana Pro outperforms GPT-4 on spatial reasoning tasks, placing creatures logically on RPG maps and executing precise image edits like removing burger ingredients while preserving exact positioning.
- The model converts photos to Lego renderings and renders Graphviz diagrams with correct logo placement, though face-swapping remains a weak point rated around five out of ten.
- Gemini 3 Pro image output shows sharper detail and texture than GPT-4 video generation, which observers describe as retaining a plasticky appearance.
Summary
Google's Nano Banana Pro reasoning model is showing measurable gains on visual instruction tasks that previously stumped competitors, with user-generated benchmarks revealing concrete improvements over GPT-4 in image editing and spatial reasoning.
The model handles spatial logic that requires inference, not just pattern matching. A user converted a Google Maps screenshot of San Francisco into an RPG-style monster map, and the model placed creatures with geographic logic: a giant octopus at the Golden Gate Bridge, a sea monster near Alcatraz, and a dragon at Twin Peaks. The placement reflects understanding of what threats fit which locations, not random image generation.
Precision editing
Nano Banana Pro solved what has been a consistent failure mode in image editing. Removing burger ingredients while keeping the top and bottom buns in their exact original positions has historically confused models that shift colors or misalign sections. Nano Banana Pro executes it cleanly. A user calling themselves Angel described it as "the first model to truly do this perfectly."
Lego conversion
Feeding a photo through Nano Banana Pro with a "make it Lego" instruction produces results that render subjects as Lego bricks and minifigures, though the model sometimes defaults to generic Lego people instead of minifigures when given photos of people. One host tested this on their own image and reported mixed results, noting the conversion is "pretty slow." The results have drawn comparisons to the "Gibli moment" that swept social media when Midjourney nailed a specific style.
Diagram rendering
A user fed raw Graphviz code (AI compute commit diagrams generated by Gemini 3) directly into Nano Banana Pro and asked it to render with company logos. The model rendered it on a single pass with logos placed correctly, described as hitting "70% of the way" toward the visual sophistication of Wall Street Journal data visualizations. Users report that adding sketched ground-truth layouts as scaffolding improves output further.
Weaker areas
Image editing and face-swapping are where Nano Banana Pro underperforms. A face-swap test replacing a face with Sam Altman was rated "just okay" and "five out of 10." The model's performance here lags behind the reasoning and spatial tasks.
Comparison to GPT
Google is pulling ahead on photorealism compared to OpenAI's video models. A side-by-side of Gemini 3 Pro image output versus GPT-4 video generation shows Gemini's version with far sharper detail and texture, while the GPT output retains what observers describe as a "plasticky look." GPT-5's video model (V4) is expected to represent a significant step forward, but Gemini 3 Pro image has already raised the bar on fidelity.