Using GLM-5 on Together AI

Implementation guide · GLM-5 · Zhipu AI

ServerlessOpen Source

Quick Start

1
Create an account at Together AI and generate an API key.
2
Use the Together AI SDK or REST API to call glm-5 — see the documentation for request format.
3
You'll be billed $1.00/1M input, $3.20/1M output tokens. See full pricing.

API Portal Documentation Pricing

Code Examples

Install

pip install together

API key

TOGETHER_API_KEY

Model ID

glm-5

Together uses "organization/model-name" format, e.g. "meta-llama/Llama-4-Scout-17B-16E-Instruct" or "Qwen/QwQ-32B". See the Together model catalog for the exact ID.

from together import Together

client = Together()  # reads TOGETHER_API_KEY from env
response = client.chat.completions.create(
    model="glm-5",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

About Together AI

Platform for running open-source and proprietary LLMs

Together AI is a platform for running open-source and proprietary LLMs with fast serverless and dedicated endpoints at competitive inference pricing.

View all models on Together AI →

Pricing on Together AI

Type	Price (per 1M)
Input tokens	$1.00
Output tokens	$3.20

Capabilities

ReasoningFunction CallingTool UseStructured OutputsPrompt Caching

About GLM-5

Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.

Full model details →

Model Specs

Released2026-02-11

Parameters744B total, 40B active

Context200k

ArchitectureMixture of Experts

Knowledge cutoff2025-11

Also available on(6)

OpenRouter$0.60/1M Fireworks AI$1.00/1M GCP Vertex AI$1.00/1M

Compare all providers →

Provider

Together AI

San Francisco, California, United States