Microsoft Launches In-House AI Models, Challenging OpenAI
April 3, 2026 – 7:06 pm
Six months after renegotiating a contract that once barred it from developing frontier AI independently, Microsoft has released three in-house models directly competing with partner OpenAI. MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 are now available in Microsoft Foundry, bearing no OpenAI branding.
These models represent the first public release from Microsoft’s MAI Superintelligence team, led by CEO Mustafa Suleyman since November 2025 with a mission to pursue "humanist superintelligence." In a March internal memo (as reported by Business Insider), Suleyman expressed his five-year goal of delivering world-class AI models for Microsoft.
Key Takeaways:
- MAI-Transcribe-1: This speech-to-text model boasts the lowest word error rate across 25 languages on the FLEURS benchmark (3.8%), outperforming OpenAI’s Whisper-large-v3, Google’s Gemini 3.1 Flash, and ElevenLabs’ Scribe v2 in most cases. It runs 2.5 times faster than Microsoft’s previous Azure Fast transcription service at a price of $0.36 per hour of audio. Notably, it was built by just 10 people.
- MAI-Voice-1: Completing the audio loop, this text-to-speech model generates 60 seconds of natural-sounding audio in under one second on a single GPU, supporting custom voice creation from short audio samples. It forms a complete voice pipeline with MAI-Transcribe-1 and a customer’s chosen large language model, all running on Microsoft infrastructure without OpenAI dependency.
- MAI-Image-2: Debuting as the third-best text-to-image model on Arena.ai in March (behind Google and OpenAI), this model has already garnered interest from enterprise partners like WPP for large-scale implementation.
The strategic shift is significant; Microsoft’s new partnership agreement with OpenAI allows it to independently pursue general AI development, a freedom that led to these breakthroughs.