Z-Image Models by Zhipu AI
1 model2025
About
Zhipu AI's (Z.ai) efficient text-to-image generation family based on a single-stream Diffusion Transformer (S3-DiT) architecture. Optimized for fast few-step sampling with bilingual Chinese-English text rendering.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
1 in view
Z-ImageCurrent
Use when the workload needs image generation, 6B parameters, and multimodal inputs.
2025-11image generation6B parametersmultimodal inputs
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Z-Image | Use when the workload needs image generation, 6B parameters, and multimodal inputs. | 2025-11 | image generation6B parametersmultimodal inputs | Current |
Release Timeline
1 release group2025-11
1 current
Z-Image
Currentimage generation6B parametersmultimodal inputs
Specifications(1 models)
| Model | Released | Parameters | Multimodal |
|---|---|---|---|
| Z-Image | 2025-11 | 6B | Yes |
Frequently Asked Questions
- What is Z-Image used for?
- Z-Image is used for image generation and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
- How does Z-Image compare to GLM-5?
- Z-Image by Zhipu AI is strongest where you need image generation, while GLM-5 by Zhipu AI is the closest related family to check for vision and multimodal work. Z-Image has 1 listed variant, while GLM-5 reaches up to 262K context, so compare the specs and pricing tables before choosing a production model.
- Which Z-Image model should I use?
- If price is the main constraint, use the pricing table first because Z-Image does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Z-Image with multimodal inputs.



