Phi 3.5 Vision Instruct
Phi 3.5 Vision Instruct has model metadata, but missing tracked provider pricing keeps it from being a default production pick.
Use it for
- Teams evaluating long context and vision
- Workloads that can use a 128k context window
Do not use it for
- Cost-sensitive launches that need sourced token pricing
- Strict JSON or tool-calling flows
- Teams that need a tracked hosted API route today
- Family
- Phi-3
- Released
- 2024-08-20
- Context
- 128k
- Parameters
- 4.1B
- Architecture
- Decoder Only
- Knowledge cutoff
- 2023-10
- Specialization
- general
- Training
- finetuned
Advancing the state-of-the-art in AI and computing.
No tracked provider token pricing is available yet.
About
Phi 3.5 Vision Instruct is Microsoft Research's Phi-3 model with multimodal text and image input. It offers a 128K-token context window with weights openly available for self-hosting and scores 43 on MMMU.
Phi 3.5 Vision Instruct is an open-source model in the Phi-3 family. The structured metadata tracks a 128k-token context window and multimodal input. Headline tracked benchmarks include Massive Multi-discipline Multimodal Understanding 43.0.
Top use-case fit
Long context
Included by capability and metadata signals in the decision map.
Vision
1 relevant benchmark in the decision map.
Provider price ladder
No tracked provider token pricing is available for this model yet.
Capabilities
Benchmark peer barsfor Vision
Benchmark scores(1)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Massive Multi-discipline Multimodal Understanding | 43.0 | — | https://mmmu-benchmark.github.io/ |
Migration checks
No linked migration route is available for this model yet.