LLM Reference

MAI-Transcribe-1.5

Released
2026-06-02
Last refreshed
2026-06-02
Status
Researched 1d ago
ProprietaryMultimodalVision

MAI-Transcribe-1.5 is worth evaluating for vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating vision
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Family
MAI
Released
2026-06-02
Architecture
transformer
Specialization
speech-recognition
License
Proprietary
Training
finetuned
Created by

Applied AI products and platforms from Microsoft

Redmond, Washington, United States
Website
Pricing
Output / 1M
-
Input / 1M
-

Cheapest of 1 route · Microsoft Foundry

About

MAI-Transcribe-1.5 is Microsoft AI's second-generation speech-to-text transcription model. It supports 43 languages, domain-specific terminology recognition, and Microsoft-reported five-times-faster transcription than competing models while maintaining state-of-the-art accuracy. Streaming support was announced as coming soon at launch.

MAI-Transcribe-1.5 is a proprietary model in the MAI family. The structured metadata tracks multimodal input and audio. This page tracks provider routes through Microsoft Foundry. No headline benchmark score is tracked for MAI-Transcribe-1.5 yet.

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

ProviderInput / 1MOutput / 1MRoute
Microsoft Foundry--
ServerlessPartial

Capabilities

MultimodalAudio

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(4)