LLM Reference

GLM-4-Flash

Released
2024-06-05
Last refreshed
2026-05-19
Status
Researched 16d ago
Long context

GLM-4-Flash has model metadata, but missing tracked provider pricing keeps it from being a default production pick.

Use it for

  • Teams evaluating long context
  • Workloads that can use a 128k context window

Do not use it for

  • Cost-sensitive launches that need sourced token pricing
  • Vision or document-understanding workloads
  • Strict JSON or tool-calling flows
Specifications
Family
GLM-4
Released
2024-06-05
Context
128k
Architecture
Decoder Only
Specialization
general
Training
finetuned
Created by

Leading China's LLM innovation surge

Beijing, China
Founded 2018
Website
Pricing

No tracked provider token pricing is available yet.

About

GLM-4-Flash, developed by Zhipu AI, is a large language model optimized for efficient and cost-effective vertical tasks. It features a high inference speed of 72.14 tokens per second, thanks to enhancements like adaptive weight quantization, parallel processing, batching strategies, and speculative sampling. Pre-trained on 10 terabytes of quality multilingual data from 26 languages, it supports multi-turn dialogue, web browsing, function execution, and long-text reasoning within a 128K context length. Users can fine-tune the model for specific applications, and access is freely available via its API interface 456.

GLM-4-Flash is a model in the GLM-4 family. The structured metadata tracks a 128k-token context window. No headline benchmark score is tracked for GLM-4-Flash yet.

Top use-case fit

Long context

Included by capability and metadata signals in the decision map.

Provider price ladder

No tracked provider token pricing is available for this model yet.

Capabilities

No model capability flags are currently sourced.

Benchmark peer barsfor Long context

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(5)