Using Gemma 7B Instruct on Together AI

Implementation guide · Gemma · Google DeepMind

ServerlessOpen Weights

Quick Start

1
Create an account at Together AI and generate an API key.
2
Use the Together AI SDK or REST API to call gemma-7b-it — see the documentation for request format.
3
You'll be billed $0.20/1M input, $0.20/1M output tokens. See full pricing.

API Portal Documentation Pricing

Code Examples

Install

pip install together

API key

TOGETHER_API_KEY

Model ID

gemma-7b-it

Together uses "organization/model-name" format, e.g. "meta-llama/Llama-4-Scout-17B-16E-Instruct" or "Qwen/QwQ-32B". See the Together model catalog for the exact ID.

from together import Together

client = Together()  # reads TOGETHER_API_KEY from env
response = client.chat.completions.create(
    model="gemma-7b-it",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

About Together AI

Platform for running open-source and proprietary LLMs

Together AI is a platform for running open-source and proprietary LLMs with fast serverless and dedicated endpoints at competitive inference pricing.

View all models on Together AI →

Pricing on Together AI

Type	Price (per 1M)
Input tokens	$0.20
Output tokens	$0.20

Capabilities

Structured Outputs

About Gemma 7B Instruct

Gemma 7B Instruct is a cutting-edge large language model developed by Google DeepMind, boasting 7 billion parameters. As part of the Gemma family, it benefits from the advanced research underpinning Google's Gemini models. This model is optimized for text generation tasks, excelling in areas like question answering and summarization, and it is finely tuned to follow instructions effectively. Despite its compact size, Gemma 7B Instruct performs impressively on benchmarks, making it versatile for deployment across various hardware platforms, from laptops to cloud infrastructure. Moreover, it is open-source, with accessible weights and incorporates responsible AI practices, such as data filtering and human feedback, to ensure safe and ethical use.

Full model details →

Model Specs

Released2024-02-21

Parameters7B

Context8k

ArchitectureDecoder Only

Knowledge cutoff2023-04

San Francisco, California, United States