LLM Reference

Gemma Models by Google DeepMind

Google DeepMindGemmaOpen weightsOpen SourceHighlight
12 models2024Up to 8k ctxFrom $0.04/1M input

Details

ResearcherGoogle DeepMind
LicenseGemma
Commercial useCommercial use with conditions
Models12
Released2024
Max context8k

Capabilities

Structured Outputs8 of 12 models

About

The Gemma family of large language models (LLMs) represents a series of advanced open models developed by Google. These lightweight models harness the cutting-edge research and technologies utilized in the Gemini models and are tailored for diverse natural language processing tasks. Gemma offers two model sizes: a 2 billion parameter version compatible with CPU and on-device environments, and a 7 billion parameter model primed for GPU and TPU platforms. Both sizes come in pre-trained and instruction-tuned forms, ensuring flexibility in their deployment. Designed for accessibility, the models support major AI frameworks and hardware platforms, embodying Google's commitment to responsible AI development with integrated safety measures and risk mitigation tools 1 5 6.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

11 in view1 retired

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs

Use when the workload needs 2k context and 2B parameters.

2024-022k context2B parameters
Gemma 7BCurrent

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs
Gemma 2BCurrent

Use when the workload needs 2k context and 2B parameters.

2024-022k context2B parameters

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs

Use when the workload needs 8k context and 7B parameters.

2024-028k context7B parameters

Use when the workload needs 8k context and 2B parameters.

2024-028k context2B parameters

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs

Use when the workload needs 8k context, 7B parameters, and structured outputs.

2024-028k context7B parametersstructured outputs

Use when the workload needs 8k context, 2B parameters, and structured outputs.

2024-028k context2B parametersstructured outputs

Release Timeline

1 release group
2024-02
11 current · 1 retired
DeepInfra Google Gemma 2B
8k context2B parametersstructured outputs
Current
DeepInfra Google Gemma 7B
8k context7B parametersstructured outputs
Current
Gemma 1.1 2B Instruct
2k context2B parameters
Current
Gemma 1.1 7B Instruct
8k context7B parametersstructured outputs
Current
Gemma 2B
2k context2B parameters
Current
Gemma 2B Instruct
2k context2B parametersstructured outputs
Archived
Gemma 7B
8k context7B parametersstructured outputs
Current
Gemma 7B Instruct
8k context7B parametersstructured outputs
Current
Gemma 7B on Google Vertex AI
8k context7B parametersstructured outputs
Current
OctoML Gemma-2B-it
8k context2B parameters
Current
OctoML Gemma-7B-it
8k context7B parameters
Current
Together AI Gemma-7B-it
8k context7B parametersstructured outputs
Current

Specifications(12 models)

Gemma model specifications comparison
ModelReleasedContextParametersStructured Outputs
Gemma 7B Instruct2024-028k7BYes
Gemma 1.1 7B Instruct2024-028k7BYes
Gemma 1.1 2B Instruct2024-022k2BNo
Gemma 7B2024-028k7BYes
Gemma 2B2024-022k2BNo
Together AI Gemma-7B-it2024-028k7BYes
OctoML Gemma-7B-it2024-028k7BNo
OctoML Gemma-2B-it2024-028k2BNo
Gemma 7B on Google Vertex AI2024-028k7BYes
DeepInfra Google Gemma 7B2024-028k7BYes
DeepInfra Google Gemma 2B2024-028k2BYes

Pricing

Gemma model pricing by provider
ModelProviderInput / 1MOutput / 1MType
Gemma 1.1 7B InstructDeepInfra$0.05$0.15Serverless
DeepInfra Google Gemma 7BDeepInfra$0.05$0.15Serverless
DeepInfra Google Gemma 2BDeepInfra$0.05$0.15Serverless
Gemma 7B InstructReplicate API$0.05$0.25Serverless
Gemma 7B InstructLepton AI API$0.07$0.07Serverless
Gemma 7B InstructGCP Vertex AI$0.1$0.3Serverless
OctoML Gemma-2B-itOctoML (Deprecated)$0.1$0.15Serverless
Gemma 7BGCP Vertex AI$0.1$0.3Serverless
Gemma 7B on Google Vertex AIGCP Vertex AI$0.125$0.375Serverless
Together AI Gemma-7B-itTogether AI$0.15$0.15Serverless
OctoML Gemma-7B-itOctoML (Deprecated)$0.15$0.2Serverless
Gemma 7B InstructFireworks AI$0.2$0.2Provisioned
Gemma 7B InstructTogether AI$0.2$0.2Serverless
Gemma 7BFireworks AI$0.2$0.2Serverless

Frequently Asked Questions

What is Gemma used for?
Gemma is used for structured outputs, coding, and math-heavy prompts. The family description and listed model capabilities point to those workloads as the best fit.
How does Gemma compare to Gemma 4?
Gemma by Google DeepMind is strongest where you need structured outputs, while Gemma 4 by Google DeepMind is the closest related family to check for multimodal. Gemma has 12 listed variants and reaches up to 8k context, while Gemma 4 reaches up to 256k context, so compare the specs and pricing tables before choosing a production model.
Which Gemma model should I use?
For the lowest listed input price, start with Gemma 2B Instruct through GCP Vertex AI at $0.04/1M input tokens. For the most capable/latest local choice, evaluate Together AI Gemma-7B-it with 8k context and structured outputs.