Key Points
- Microsoft announced its first in‑house AI models: MAI-Voice-1 and MAI-1-preview.
- MAI-Voice-1 can generate a minute of audio in under one second on a single GPU.
- MAI-1-preview was trained on roughly 15,000 Nvidia H100 GPUs.
- The speech model powers Copilot Daily and is available for trial in Copilot Labs.
- The text model will be rolled out in select Copilot scenarios, complementing OpenAI models.
- Both models are being benchmarked publicly on LMArena.
- Microsoft emphasizes a consumer‑focused AI strategy leveraging its data assets.

New In‑House Models Unveiled
Microsoft’s AI division introduced two of its first internally developed models. MAI-Voice-1 is a speech‑generation model capable of producing a minute’s worth of audio in under one second using just one GPU. MAI-1-preview is a text‑generation model that provides instruction‑following responses and is described as offering “a glimpse of future offerings inside Copilot.”
Performance and Capabilities
The speech model’s speed and efficiency enable real‑time audio generation for applications such as Copilot Daily, where an AI host recites top news stories, and podcast‑style discussions that help explain topics. The text model, trained on around 15,000 Nvidia H100 GPUs, is built for users needing helpful, everyday‑query answers.
Integration with Microsoft Products
Microsoft is already leveraging MAI-Voice-1 within Copilot Daily and has made the model available for experimentation in Copilot Labs, allowing users to input prompts, choose voice styles, and adjust speaking tone. MAI-1-preview is slated for limited rollout in Copilot’s text‑based use cases, supplementing the existing reliance on OpenAI’s large language models.
Testing and Future Plans
Both models are undergoing public testing on the AI benchmarking platform LMArena. Microsoft’s AI leadership emphasizes a consumer‑centric approach, noting the company’s access to extensive predictive data from advertising and telemetry. The announcement signals a broader ambition to develop a suite of specialized models that serve distinct user intents, with the expectation that this ecosystem will unlock significant value for Microsoft’s AI services.
Source: theverge.com