GLM-5V-Turbo
ProprietaryMultimodal
About
First native multimodal variant of GLM-5 with CogViT visual encoder. Specialized for design-to-code tasks—converts mockups, screenshots, Figma exports, and hand-drawn sketches into HTML, CSS, and JavaScript. Trained with reinforcement learning across 30+ task types with INT8 quantization. Achieved 94.8 on Design2Code benchmark (vs Claude Opus 4.6: 77.3). Supports image, video, and text inputs natively.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution