GLM-4V 9B
About
GLM-4V-9B is an advanced, open-source multimodal large language model developed by THUDM at Tsinghua University. Building on the GLM-4 series, it incorporates autoregressive blank infilling and hybrid pretraining objectives, enhancing its capabilities in both text and image processing. This model excels in tasks like multi-round conversations in English and Chinese, image understanding, and high-resolution processing up to 1120 x 1120 pixels. Its strong performance surpasses other leading models like GPT-4 on various benchmarks, and it supports a large context window of up to 8K tokens, facilitating comprehensive understanding of longer inputs. Its open-source nature enriches the community by allowing wider access and collaboration.
Capabilities
MultimodalFunction CallingTool UseJSON Mode
Providers(1)
| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| Replicate API | — | — | Serverless |