LLM Reference

MAI-Voice-1

Proprietary

About

High-fidelity speech generation model capable of producing 60 seconds of expressive, natural audio in under one second on a single GPU. Supports custom voices for personalized audio experiences. Powers Copilot's Audio Expressions and podcast features. Use cases include conversational AI, agent assist, live captioning, accessibility, and education platforms.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution

Rankings

Specifications

FamilyMAI
Released2026-04-02
ArchitectureTransformer
SpecializationText-to-Speech Generation
TrainingSupervised learning on diverse voice data

Created by

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States
Founded 1991
Website