Phi-4 Multimodal
Phi-4 Multimodal is a released long context and vision model with open-source and 128k context; evaluate it while provider pricing coverage matures.
Use it for
- Teams evaluating long context and vision
- Workloads that can use a 128k context window
Do not use it for
- Cost-sensitive launches that need sourced token pricing
- Strict JSON or tool-calling flows
- Teams that need a tracked hosted API route today
Advancing the state-of-the-art in AI and computing.
No tracked provider token pricing is available yet.
About
Microsoft Phi-4 Multimodal is the multimodal variant of Phi-4 capable of processing images and text. Distinct from phi-4-multimodal-instruct (which is the instruction-tuned version). Engineer note: check if same as phi-4-multimodal-instruct in seed; Azure Foundry may list base and instruct as separate SKUs.
Phi-4 Multimodal is an open-source model in the Phi-4 family. The structured metadata tracks a 128k-token context window and multimodal input. No headline benchmark score is tracked for Phi-4 Multimodal yet.
Top use-case fit
Long context
Included by capability and metadata signals in the decision map.
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
No tracked provider token pricing is available for this model yet.
Capabilities
Benchmark peer barsfor Long context
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.
Frequently asked questions
What is the context window of Phi-4 Multimodal?
Phi-4 Multimodal has a context window of 128k tokens.
When was Phi-4 Multimodal released?
Phi-4 Multimodal was released on 2025-01-01.
Advancing the state-of-the-art in AI and computing.
No tracked provider token pricing is available yet.