LLM Reference

Phi-3 Vision

phi-3-vision

Researched 144d ago

Last refreshed 2026-05-16. Next refresh: weekly.

Open SourceLong contextVision

Phi-3 Vision is worth evaluating for long context and vision when its provider route and context window match the workload.

Decision context: Long context task fit, 3 tracked provider routes, and research from 2026-01-01.

Use it for

  • Teams evaluating long context and vision
  • Workloads that can use a 128K context window
  • Buyers comparing 3 tracked provider routes

Do not use it for

  • Strict JSON or tool-calling flows

Cheapest output

$0.200

Fireworks AI per 1M tokens

Provider routes

3

Tracked API hosts

Quality / dollar

Unknown

No task benchmark coverage yet

Freshness

2026-01-01

Researched 144d ago

stale

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 3
ProviderInput / 1MOutput / 1MRoute
Fireworks AI$0.200$0.200
Serverless
Microsoft Foundry$0.280$0.840
Provisioned
NVIDIA NIM--
ProvisionedPartial

Benchmark peer barsfor Long context

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

About

Phi-3 Vision is a sophisticated multimodal AI model from Microsoft, designed to adeptly integrate language and vision capabilities. Unlike traditional language models, it processes both text and images and can perform tasks such as optical character recognition, chart analysis, and image interpretation. Its architecture features an image encoder, a text-image connector, a projector for mapping image features, and the Phi-3 Mini language model. Despite its relatively small size of 4.2 billion parameters, it competes with larger models and suits devices with limited computational power. Phi-3 Vision's ability to handle up to 128K tokens supports complex multimodal reasoning. It draws upon high-quality and synthetic data for training while incorporating essential safety measures.

Phi-3 Vision has a 128K-token context window.

Phi-3 Vision input tokens at $0.2/1M, output at $0.2/1M.

Capabilities

Vision

Rankings

Specifications

FamilyPhi-3
Released2024-05-21
Parameters4.2B
Context128K
ArchitectureDecoder Only
Knowledge cutoff2023-10
Specializationgeneral
Trainingfinetuned

Created by

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States
Founded 1991
Website