Microsoft MAI-Image-2 Debuts at #3 on Global AI Image Leaderboard
Microsoft released MAI-Image-2 on March 19, placing its in-house image generation model at #3 on the Arena.ai text-to-image leaderboard — directly behind Google's Gemini 3.1 Flash and OpenAI's GPT Image 1.5.
The announcement came from the Microsoft AI Superintelligence team, the internal research group now led full-time by Mustafa Suleiman, who stepped back from a broader CEO role at Microsoft AI earlier this week to focus exclusively on frontier model development.
What's New
MAI-Image-2 targets three specific gaps identified through conversations with photographers, designers, and visual artists:
- Photorealism — natural lighting, accurate skin tones, and textured environments designed to reduce manual post-production work
- In-image text — consistent rendering of readable lettering within scenes, from signage to infographics, a category where most image models still struggle
- Dense scene generation — cinematic framing, surreal compositions, and high-detail environments
Rollout
The model is available now in the MAI Playground at playground.microsoft.ai. It's also beginning to roll out on Copilot and Bing Image Creator, which together reach hundreds of millions of users.
Enterprise API access is live today for select customers. Broader developer access through Microsoft Foundry will open "soon," though no date was given. A commercial application form is available for organisations needing large-scale image generation.
Context
A year ago, Microsoft depended almost entirely on OpenAI's DALL-E models for Copilot and Bing. MAI-Image-1, launched in October 2025, was the first fully in-house model. MAI-Image-2 extends that trajectory, landing the company's own technology in the top tier of a competitive field.
The team also noted its next-generation GB200 compute cluster is now operational, hinting at further model releases ahead.