Microsoft Launches Three Frontier AI Models, Suleyman Shifts Focus to Superintelligence
Microsoft AI dropped three foundational models for commercial use on Thursday, marking the company's most aggressive step yet to build its own AI stack independent of OpenAI.
The Models
MAI-Transcribe-1 handles speech-to-text across 25 languages at 2.5 times the speed of Microsoft's previous Azure offering. It was built for messy real-world audio - background noise, overlapping speakers, low-quality recordings. MAI-Voice-1 generates 60 seconds of speech in one second and supports custom voice creation. MAI-Image-2, which debuted on MAI Playground in March, rounds out the trio.
All three are now available through Microsoft Foundry and the new MAI Playground, the first time they've been broadly offered for commercial use.
The Price Play
Microsoft is positioning cost as the killer feature. MAI-Transcribe-1 starts at $0.36 per hour of audio, which Suleyman claims is half the GPU cost of competing models. Voice generation runs $22 per million characters, and image generation starts at $5 per million input tokens.
The Bigger Story
The release came from the MAI Superintelligence team, a unit Suleyman has led full-time since the company's mid-March reorganization. Former Snap executive Jacob Andreou now runs day-to-day Copilot operations, freeing Suleyman to chase what he calls "the absolute frontier."
In interviews with The Verge and Bloomberg, Suleyman revealed the pivot had been in the works for nine months. A renegotiated OpenAI partnership formally unlocked Microsoft's ability to pursue frontier model development in-house. The company aims to produce "large cutting-edge AI models" by 2027.
The models were built by a lean 10-person team that Suleyman says was "liberated from any of the bureaucracy" - a strategy other labs, including Anthropic and Meta, are also adopting.