OpenAI’s new image model reasons before it draws

OpenAI's New Image Model: Reasoning Before It Draws

April 23, 2026 - 1:24 pm

The new model reasons about composition, searches the web for context, generates up to eight coherent images from one prompt, and renders text in non-Latin scripts with near-flawless accuracy. It also took the number one spot on the Image Arena leaderboard within 12 hours of launch, by the largest margin ever recorded.

Two years ago, asking ChatGPT to generate a visual was like commissioning a poster from a sleep-deprived intern with a glue stick and a head injury. You’d ask for a clean design and get “leftovers creativity” splashed across the image, plus three new words that looked like they’d been invented during a minor software malfunction.

The images looked AI-generated in the way that has become a cultural shorthand for uncanny: almost right, conspicuously wrong, and instantly recognisable as synthetic. The leap matters. Text rendering has been the persistent, embarrassing weakness of AI image generators since DALL-E first turned heads in January 2021, a model we covered at the time as a fascinating curiosity.

Images 2.0 claims approximately 99% accuracy in text rendering across any language and script, including Japanese, Korean, Chinese, Hindi, and Bengali. If that figure holds in independent testing, it closes the gap between “impressive AI demo” and “tool a graphic designer would actually use for production work.”

The architectural change that makes the model different, though not just better, is what OpenAI calls “thinking capabilities.” Images 2.0 is the company’s first image model to integrate its O-series reasoning architecture. Before generating a pixel, the model researches the prompt, plans the composition, reasons about spatial relationships between elements, and can search the web for real-time context.

It is, in OpenAI’s framing, not a rendering tool but a “visual thought partner.”

This is my cat transformed into a comic strip with ChatGPT.

In practice, this manifests in two access modes. Instant mode ships to all ChatGPT users, including free-tier accounts, and delivers the core quality improvements: better text, sharper editing, richer layouts. Thinking mode, which enables web search, multi-image batching, and output verification, is restricted to Plus ($20/month), Pro ($200/month), Business, and Enterprise subscribers.

The distinction is commercially significant. The reasoning capabilities, where most of the quality premium lives, sit behind the paywall. Free users get better images; paying users get images the model has thought about.

The multi-image capability is the feature most likely to change professional workflows. A single prompt can now produce up to eight images that maintain character and object continuity across the set. That means a designer can generate a family of social media assets, a children’s book sequence, or a series of storyboard frames from one instruction, with consistent visual identity throughout. Previously, each image had to be prompted individually and stitched together manually. For marketing teams and content creators, that is a meaningful reduction in production friction.

The integration into Codex, OpenAI’s coding environment, is the next step.