OpenAI Launches GPT Image 1.5, a Faster, Cheaper Native Multimodal Image Model

Key Points

OpenAI releases GPT Image 1.5, a native multimodal image model integrated into ChatGPT.
The model generates images up to four times faster than its predecessor.
Costs are reduced by roughly 20 percent through the API.
Users can edit photos with simple text prompts, preserving facial likenesses.
GPT Image 1.5 treats text and images as interchangeable tokens within a single model.
The rollout follows earlier AI image‑editing advances from OpenAI and competitors.
Developers can access the model via API for scalable, cost‑effective image generation.

OpenAI’s new ChatGPT image generator makes faking photos easy

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

OpenAI Expands Multimodal Capabilities with GPT Image 1.5

OpenAI announced the rollout of GPT Image 1.5, a new image‑generation and editing model that lives inside the same neural network that handles language prompts. By treating text and image data as interchangeable tokens, the model can respond to natural‑language instructions that modify photos, such as inserting a person into a new setting, changing attire, or removing unwanted objects. The integration is available to all ChatGPT users, allowing a conversational workflow where users type or speak edits and receive updated images in real time.

Performance Gains and Cost Reductions

According to OpenAI, GPT Image 1.5 generates images up to four times faster than the previous version and does so at about 20 percent lower cost through the API. These efficiency improvements are positioned as a step toward making high‑quality image manipulation a routine part of everyday digital communication, without the need for specialized photo‑editing tools or expertise.

Technical Distinction: Native Multimodal Design

The model differs from earlier OpenAI image generators, such as DALL‑E 3, which relied on a separate diffusion process. GPT Image 1.5’s native multimodal architecture processes both visual and textual inputs within a single model, allowing it to predict image pixels in the same way it predicts the next word in a sentence. This unified approach simplifies tasks that require tight coordination between text and visual elements, making edits like “put him in a tuxedo at a wedding” more fluid and accurate.

User Experience and Creative Flexibility

Early demonstrations show the model’s ability to preserve facial likenesses across multiple edits, change poses, alter angles, and apply different visual styles. Users can iteratively refine an image by conversing with the AI—much like editing a draft of an email—resulting in a more intuitive and accessible creative process.

Context Within the AI Image‑Editing Landscape

OpenAI’s release follows a period of rapid development in AI‑driven image editing. While OpenAI had been working on a conversational image‑editing model since GPT‑4o, other companies, notably Google, introduced public prototypes earlier in the year and later refined them into widely used tools. The competitive pressure appears to have accelerated OpenAI’s launch of GPT Image 1.5, positioning it as a direct alternative for developers and end‑users seeking faster, cheaper, and more integrated image‑generation capabilities.

Implications for Developers and Businesses

With the model now accessible via the ChatGPT interface and API, developers can embed advanced image‑editing features into their applications without building separate pipelines for text and vision. The cost and speed improvements make it viable for higher‑volume use cases, such as personalized marketing content, rapid prototyping of visual assets, and real‑time creative assistance.

Looking Ahead

OpenAI’s emphasis on native multimodal processing suggests a broader strategy of unifying language and vision models to streamline user interaction. As GPT Image 1.5 gains adoption, further refinements in image quality, editing precision, and integration with other AI services are likely to shape the future of conversational visual creativity.

Source: arstechnica.com