LLM Reference

GLM-4-Flash

About

GLM-4-Flash, developed by Zhipu AI, is a large language model optimized for efficient and cost-effective vertical tasks. It features a high inference speed of 72.14 tokens per second, thanks to enhancements like adaptive weight quantization, parallel processing, batching strategies, and speculative sampling. Pre-trained on 10 terabytes of quality multilingual data from 26 languages, it supports multi-turn dialogue, web browsing, function execution, and long-text reasoning within a 128K context length. Users can fine-tune the model for specific applications, and access is freely available via its API interface 456.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyGLM-4
Released2024-06-05
ArchitectureDecoder Only
Specializationgeneral