Pricing
| Type | Price (per 1M) |
|---|---|
| Input tokens | $0.05 |
| Output tokens | $0.10 |
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution
About Granite 4.1 8B
IBM Granite 4.1 8B is a dense decoder-only transformer instruct model with 40 layers, 4096 embedding size, GQA (32 attention heads, 8 KV heads). Supports multilingual dialog (12 languages), code with FIM, tool-calling/function-calling, RAG, and summarization. Trained on NVIDIA GB200 NVL72 cluster. Apache 2.0. Benchmarks: MMLU 73.84, HumanEval 85.37, GSM8K 92.49, BFCL v3 68.27.
Get Started
Model Specs
Released2026-04-29
Parameters8B
Context131K
ArchitectureDense decoder-only transformer: 40 layers, 4096 embed, 32 attn heads, 8 KV heads, SwiGLU, RoPE, RMSNorm