Gemma 2 9B Instruct
Gemma 2 9B Instruct is worth evaluating for classification and json / tool use when its provider route and context window match the workload.
Use it for
- Teams evaluating classification and json / tool use
- Workloads that can use a 8k context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Vision or document-understanding workloads
Cheapest of 5 routes · Replicate API
About
Gemma 2 9B Instruct, developed by Google, is a state-of-the-art large language model based on the advanced Gemini framework. It is a decoder-only transformer model with 9 billion parameters, offering a balance between size and performance. The model is trained on an expansive dataset comprising 8 trillion tokens, including web documents, code, and mathematical text, a notable 30% increase from its predecessor, Gemma 1.1. This allows it to adeptly handle diverse tasks such as question answering, creative writing, coding, and mathematical problem-solving. However, it shares common limitations of large language models, such as potential biases and the risk of generating inaccuracies or outdated information. Notably, Gemma 2 9B Instruct incorporates Grouped-Query Attention (GQA) and uses the GeGLU activation function, and is specifically fine-tuned to follow instructions and participate effectively in multi-turn dialogues.
Gemma 2 9B Instruct is an open-weight model in the Gemma 2 family. The structured metadata tracks a 8k-token context window and structured outputs. This page tracks provider routes through Fireworks AI, NVIDIA NIM, OpenRouter, and 2 more, with the cheapest tracked route listed at $0.1 input and $0.1 output per 1M tokens. Headline tracked benchmarks include Instruction-Following Evaluation 65.5.
Top use-case fit
Classification
Included by capability and metadata signals in the decision map.
JSON / Tool use
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare all 5Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Replicate API | $0.100 | $0.100 | Serverless |
| Fireworks AI | $0.200 | $0.200 | Serverless |
| Chutes AI | $0.100 | $0.300 | Serverless |
| NVIDIA NIM | - | - | ProvisionedPartial |
Available via routers & gateways(2)
NVIDIA LLM Router Blueprint
RouterNVIDIA's open-source AI blueprint for LLM routing that selects the optimal model per prompt via intent classification or neural auto-routing; being deprecated 2026-06-20.
OpenRouter
HybridUnified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.
Capabilities
Benchmark peer barsfor Classification
No task-mapped benchmark peers are available for this model yet.
Benchmark scores(1)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Instruction-Following Evaluation | 65.5 | v2 | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
Migration checks
No linked migration route is available for this model yet.
Cheapest of 5 routes · Replicate API