LLM ReferenceLLM Reference

Phi 4 Multimodal Instruct

phi-4-multimodal-instruct

Researched 137d ago

Last refreshed 2026-05-16. Next refresh: weekly.

Open SourceMultimodalLong contextVision

Phi 4 Multimodal Instruct is worth evaluating for long context and vision when its provider route and context window match the workload.

Decision context: Long context task fit, 3 tracked provider routes, and research from 2026-01-01.

Use it for

  • Teams evaluating long context and vision
  • Workloads that can use a 128K context window
  • Buyers comparing 3 tracked provider routes

Do not use it for

  • Strict JSON or tool-calling flows

Cheapest output

$0.900

Fireworks AI per 1M tokens

Provider routes

3

Tracked API hosts

Quality / dollar

Unknown

No task benchmark coverage yet

Freshness

2026-01-01

Researched 137d ago

stale

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

ProviderInput / 1MOutput / 1MRoute
Fireworks AI$0.900$0.900
Serverless
Microsoft Foundry--
ServerlessPartial
NVIDIA NIM--
ServerlessPartial

Benchmark peer barsfor Long context

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

About

Phi 4 Multimodal Instruct has a 128K-token context window.

Phi 4 Multimodal Instruct input tokens at $0.9/1M, output at $0.9/1M.

Capabilities

VisionMultimodal

Rankings

Specifications

FamilyPhi-4
Released2025-01-01
Context128K
ArchitectureDecoder Only
Specializationmultimodal
Trainingpretrained
Fine-tuninginstruct

Created by

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States
Founded 1991
Website