LLM Reference

MAI Models by Microsoft AI

Microsoft AIProprietary
5 models2026Up to 33K ctxFrom $0.36/1M input

About

Microsoft AI (MAI) models announced April 2, 2026. A suite of foundation models for speech recognition, speech generation, and image understanding/generation. Available through Microsoft Foundry enterprise platform.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

5 in view
MAI-DS-R1Current

Use when the workload needs reasoning.

2026-05reasoning

Use when the workload needs image, 33K context, and multimodal inputs.

2026-04image33K contextmultimodal inputs

Use when the workload needs audio and multimodal inputs.

2026-04audiomultimodal inputs

Use when the workload needs audio and multimodal inputs.

2026-04audiomultimodal inputs

Use when the workload needs image and multimodal inputs.

2026-03imagemultimodal inputs

Release Timeline

3 release groups
2026-05
1 current
MAI-DS-R1
reasoning
Current
2026-04
3 current
MAI-Image-2e
image33K contextmultimodal inputs
Current
MAI-Transcribe-1
audiomultimodal inputs
Current
MAI-Voice-1
audiomultimodal inputs
Current
2026-03
1 current
MAI-Image-2
imagemultimodal inputs
Current

Specifications(5 models)

MAI model specifications comparison
ModelReleasedContextVisionMultimodalReasoning
MAI-DS-R12026-05NoNoYes
MAI-Image-2e2026-0433KYesYesNo
MAI-Transcribe-12026-04NoYesNo
MAI-Voice-12026-04NoYesNo
MAI-Image-22026-03YesYesNo

Available From(1 provider)

Pricing

MAI model pricing by provider
ModelProviderInput / 1MOutput / 1MType
MAI-Transcribe-1Microsoft Foundry$0.36Serverless
MAI-Image-2Microsoft Foundry$5$33Serverless
MAI-Voice-1Microsoft Foundry$22Serverless

Frequently Asked Questions

What is MAI used for?
MAI is used for reasoning, image, and audio. The family description and listed model capabilities point to those workloads as the best fit.
How does MAI compare to Claude 3?
MAI by Microsoft AI is strongest where you need reasoning, while Claude 3 by Anthropic is the closest related family to check for vision and multimodal work. MAI has 5 listed variants and reaches up to 33K context, while Claude 3 reaches up to 200K context, so compare the specs and pricing tables before choosing a production model.
Which MAI model should I use?
For the lowest listed input price, start with MAI-Transcribe-1 through Microsoft Foundry at $0.36/1M input tokens. For the most capable/latest local choice, evaluate MAI-Image-2e with 33K context and multimodal inputs.

Models(5)