LLM Reference

Phi 4 Multimodal Instruct

Released
2025-01-01
Last refreshed
2026-05-22
Status
Researched 41d ago
Open sourceCommercial use: permittedMultimodalLong contextVision

Phi 4 Multimodal Instruct is worth evaluating for long context and vision when its provider route and context window match the workload.

Use it for

  • Teams evaluating long context and vision
  • Workloads that can use a 128k context window
  • Buyers comparing 3 tracked provider routes

Do not use it for

  • Strict JSON or tool-calling flows
Specifications
Family
Phi-4
Released
2025-01-01
Context
128k
Parameters
5.6B
Architecture
Decoder Only
Knowledge cutoff
2024-06
Specialization
multimodal
Openness
Open source
License
MITOSI-approvedCommercial use: permitted
Training
Pretrained
Created by

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States
Founded 1991
Website
Pricing
Output / 1M
$0.900
Input / 1M
$0.900

Cheapest of 3 routes · Fireworks AI

About

Phi 4 Multimodal Instruct is Microsoft Research's Phi-4 model focused on multimodal input across text, image, and beyond. It offers a 128K-token context window with weights openly available for self-hosting.

Phi 4 Multimodal Instruct is an open-source model in the Phi-4 family. The structured metadata tracks a 128k-token context window and multimodal input. This page tracks provider routes through Fireworks AI, NVIDIA NIM, and Microsoft Foundry, with the cheapest tracked route listed at $0.9 input and $0.9 output per 1M tokens. Headline tracked benchmarks include MMMU Pro 38.5.

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 3

Compare API pricing across 3 providers for input and output tokens, batch, and cached reads when available.

ProviderInput / 1MOutput / 1MRoute
Fireworks AI$0.900$0.900
Serverless
Microsoft Foundry--
ServerlessPartial
NVIDIA NIM--
ServerlessPartial

Available via routers & gateways(7)

Capabilities

VisionMultimodal

Benchmark peer barsfor Long context

No task-mapped benchmark peers are available for this model yet.

Benchmark scores(1)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.
BenchmarkScoreVersionSource
MMMU Pro38.5LLM-Stats aggregatorhttps://llm-stats.com/benchmarks/mmmu-pro

Migration checks

No linked migration route is available for this model yet.

Compare Phi 4 Multimodal Instruct with other models

Frequently asked questions

What is the context window of Phi 4 Multimodal Instruct?

Phi 4 Multimodal Instruct has a context window of 128k tokens.

How much does Phi 4 Multimodal Instruct cost?

Phi 4 Multimodal Instruct is available at $0.9/1M input tokens through Fireworks AI.

When was Phi 4 Multimodal Instruct released?

Phi 4 Multimodal Instruct was released on 2025-01-01.

Which providers offer Phi 4 Multimodal Instruct?

Phi 4 Multimodal Instruct is available from 3 providers: Fireworks AI, NVIDIA NIM, Microsoft Foundry.

What benchmarks has Phi 4 Multimodal Instruct been tested on?

Phi 4 Multimodal Instruct has been evaluated on 1 benchmark, including MMMU Pro.