MAI-Thinking-1

Name: MAI-Thinking-1
Author: Microsoft AI

Released

2026-06-02

Last refreshed

2026-06-15

Status

Researched 36d ago

ProprietaryCommercial use: conditionalCodingRAGAgentsLong contextClassificationJSON / Tool useHighlight

MAI-Thinking-1 is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

Teams evaluating coding, rag, and agents
Workloads that can use a 256k context window
Buyers comparing 1 tracked provider route

Do not use it for

Vision or document-understanding workloads

Specifications

Family: MAI
Released: 2026-06-02
Context: 256k
Parameters: 1T total / 35B active
Architecture: Mixture of Experts
Specialization: reasoning
Openness: Proprietary
License: ProprietaryCommercial use: conditional
Weights: Not released
Code: Unknown
Training: Pretrained

Created by

Microsoft AI

Applied AI products and platforms from Microsoft

Redmond, Washington, United States

Website

Pricing

Output / 1M

Input / 1M

Cheapest of 1 route · Microsoft Foundry

Providers(1)

Microsoft Foundry

View 1 provider route

Links

Website

About

MAI-Thinking-1 is Microsoft AI's flagship reasoning model, built from scratch on enterprise-grade commercially licensed data without third-party distillation. The sparse mixture-of-experts model activates about 35B parameters from roughly 1T total parameters, supports a 256K-token context window, and targets frontier reasoning and software engineering work at a mid-weight price point. Microsoft reports 97% on AIME 2025, 94.5% on AIME 2026, 84.2% on GPQA Diamond, 87.7% on LiveCodeBench v6, 73.5% on SWE-bench Verified, and 52.8% on SWE-bench Pro. In a 1,276-task Surge blind side-by-side evaluation, it narrowly beat Claude Sonnet 4.6 but trailed Claude Opus 4.6. It supports function calling and developer instructions through the Chat Completions API.

MAI-Thinking-1 is a proprietary model in the MAI family. The structured metadata tracks a 256k-token context window, reasoning, function calling, and tool use. This page tracks provider routes through Microsoft Foundry. Headline tracked benchmarks include AIME 2025 97.0, AIME 2026 94.5, and HMMT February 2026 84.9.

Top use-case fit: coding, agents, and build tasks

Coding

3 relevant benchmarks in the decision map.

RAG

Included by capability and metadata signals in the decision map.

Agents

2 relevant benchmarks in the decision map.

Provider price ladder

Compare API pricing across 1 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Route
Microsoft Foundry	-	-	ServerlessPartial

Available via routers & gateways(5)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSMicrosoft Foundry

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionMicrosoft Foundry

Azure AI Foundry Model Router

Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

PassthroughMicrosoft Foundry

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionMicrosoft Foundry

Kong AI Gateway

Gateway

Multi-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.

SubscriptionMicrosoft Foundry

Capabilities

ReasoningFunction CallingTool Use

Benchmark peer barsfor Coding

SWE-bench ProRank 31 of 41

80.3

69.2

64.7

64.6

MAI-Thinking-1current

52.8

SWE-bench VerifiedRank 46 of 80

Claude Fable 5

96.0

Claude Mythos Preview

93.9

Claude Opus 4.8

88.6

Claude Opus 4.7

87.6

MAI-Thinking-1current

73.5

LiveCodeBenchRank 10 of 55

DeepSeek V4 Pro

93.5

Gemini 3.1 Pro Preview

91.7

DeepSeek V4 Flash

91.6

Qwen3.7-Max

91.6

MAI-Thinking-1current

87.7

Benchmark scores(10)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Source
AIME 2025	97.0	AIME 2025	https://microsoft.ai/news/introducing-mai-thinking-1/
AIME 2026	94.5	AIME 2026	https://microsoft.ai/news/introducing-mai-thinking-1/
HMMT February 2026	84.9	HMMT Feb 2026	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
Google-Proof Q&A	84.2	GPQA Diamond	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
LiveCodeBench	87.7	v6	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
Terminal-Bench 2.0	46.0	Terminal-Bench 2.0	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
SWE-bench Verified	73.5	SWE-bench Verified	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
SWE-bench Pro	52.8	Public dataset	https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
MMLU PRO	85.0	MMLU-Pro (accuracy)	https://llm-stats.com/benchmarks/mmlu-pro
MultiChallenge	53.0	Multi-Challenge leaderboard rank 15 of 28 (accuracy%)	https://llm-stats.com/benchmarks/multichallenge