InternLM XComposer2 VL 7B
InternLM XComposer2 VL 7B has model metadata, but missing tracked provider pricing keeps it from being a default production pick.
Use it for
- Teams evaluating vision
Do not use it for
- Cost-sensitive launches that need sourced token pricing
- Vision or document-understanding workloads
- Strict JSON or tool-calling flows
- Family
- InternLM-XComposer2
- Released
- 2024-04-09
- Parameters
- 7B
- Architecture
- Decoder Only
- Knowledge cutoff
- 2023-08
- Specialization
- general
- Training
- finetuned
About
InternLM-XComposer2-VL-7B is an advanced vision-language large model (VLLM) built on InternLM2 architecture, designed for robust text-image comprehension and composition. It leverages Partial LoRA (P-LoRA) to align embedding spaces effectively between a pre-trained Vision Transformer (ViT) and the language model, enhancing multimodal understanding. The model undergoes pretraining to refine general semantics and improve visual capabilities using datasets like COCO and TextCaps, followed by supervised fine-tuning with various vision-language tasks. It excels in image captioning, visual question answering, and creative text-image compositions, capable of handling high-resolution images and fine-grained details. The InternLM-XComposer2-VL-7B family includes a 4-bit quantized version for reduced VRAM usage, along with other variants for high-resolution understanding and long-contextual inputs.
InternLM XComposer2 VL 7B is a model in the InternLM-XComposer2 family. Headline tracked benchmarks include GAOKAO-MM 33.2.
Top use-case fit
Vision
1 relevant benchmark in the decision map.
Provider price ladder
No tracked provider token pricing is available for this model yet.
Capabilities
No model capability flags are currently sourced.
Benchmark peer barsfor Vision
Benchmark scores(1)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| GAOKAO-MM | 33.2 | zero-shot | https://github.com/OpenMOSS/GAOKAO-MM |
Migration checks
No linked migration route is available for this model yet.