Qwen VL
About
Multimodal vision-language model processing images and text for visual understanding.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| Replicate API | $0.05 | $0.25 | Serverless |