LLM Reference

GLM-4V 9B

About

GLM-4V-9B is an advanced, open-source multimodal large language model developed by THUDM at Tsinghua University. Building on the GLM-4 series, it incorporates autoregressive blank infilling and hybrid pretraining objectives, enhancing its capabilities in both text and image processing. This model excels in tasks like multi-round conversations in English and Chinese, image understanding, and high-resolution processing up to 1120 x 1120 pixels. Its strong performance surpasses other leading models like GPT-4 on various benchmarks, and it supports a large context window of up to 8K tokens, facilitating comprehensive understanding of longer inputs. Its open-source nature enriches the community by allowing wider access and collaboration.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
Replicate API
Serverless

Specifications

FamilyGLM-4
Released2024-06-05
ArchitectureDecoder Only
Specializationgeneral