LLM Reference

MAI-Voice-1

mai-voice-1

Researched 3d ago

Last refreshed 2026-05-19. Next refresh: weekly.

ProprietaryMultimodalVision

MAI-Voice-1 is worth evaluating for vision when its provider route and context window match the workload.

Decision context: Vision task fit, 1 tracked provider route, and research from 2026-05-19.

Use it for

  • Teams evaluating vision
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows

Cheapest output

-

Microsoft Foundry per 1M tokens

Provider routes

1

Tracked API hosts

Quality / dollar

Unknown

No task benchmark coverage yet

Freshness

2026-05-19

Researched 3d ago

fresh

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

ProviderInput / 1MOutput / 1MRoute
Microsoft Foundry$22.00-
ServerlessPartial

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

About

Microsoft AI voice generation with emotional nuance and speaker identity preservation. Generates 60 seconds of audio in 1 second. Supports custom voice creation from brief audio samples.

Capabilities

Multimodal

Rankings

Specifications

FamilyMAI
Released2026-04-02
ArchitectureTransformer
Specializationaudio
Trainingfinetuned

Created by

Applied AI products and platforms from Microsoft

Redmond, Washington, United States
Website

Providers(1)