Phi 3.5 Vision Instruct

Name: Phi 3.5 Vision Instruct
Author: Microsoft Research

Released

2024-08-20

Last refreshed

2026-05-19

Status

Researched 60d ago

Open sourceCommercial use: permittedMultimodalLong contextVision

Phi 3.5 Vision Instruct is a released long context and vision model with open-source and 128k context; evaluate it while provider pricing coverage matures.

Use it for

Teams evaluating long context and vision
Workloads that can use a 128k context window

Do not use it for

Cost-sensitive launches that need sourced token pricing
Strict JSON or tool-calling flows
Teams that need a tracked hosted API route today

Specifications

Family: Phi-3
Released: 2024-08-20
Context: 128k
Parameters: 4.1B
Architecture: Decoder Only
Knowledge cutoff: 2023-10
Specialization: general
Openness: Open source
License: MITOSI-approvedCommercial use: permitted
Weights: Unknown
Code: Unknown
Training: Fine-tuned

Created by

Microsoft Research

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States

Founded 1991

Website

Pricing

No tracked provider token pricing is available yet.

About

Phi 3.5 Vision Instruct is Microsoft Research's Phi-3 model with multimodal text and image input. It offers a 128K-token context window with weights openly available for self-hosting and scores 43 on MMMU.

Phi 3.5 Vision Instruct is an open-source model in the Phi-3 family. The structured metadata tracks a 128k-token context window and multimodal input. Headline tracked benchmarks include Massive Multi-discipline Multimodal Understanding 43.0.

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Vision

1 relevant benchmark in the decision map.

Provider price ladder

No tracked provider token pricing is available for this model yet.

Capabilities

VisionMultimodal

Benchmark peer barsfor Vision

Massive Multi-discipline Multimodal UnderstandingRank 45 of 46

Qwen3.6-Plus

86.0

ByteDance Doubao Seed 2.0 Pro

85.4

Qwen3.5-397B-A17B

85.0

Gemini 3.5 Flash

83.6

Phi 3.5 Vision Instructcurrent

43.0

Benchmark scores(1)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.