Using Llama 2 13B Chat on GCP Vertex AI

Implementation guide · Llama 2 · AI at Meta

ServerlessOpen Weights

Quick Start

1
Create an account at GCP Vertex AI and generate an API key.
2
Use the GCP Vertex AI SDK or REST API to call llama2-13b-chat — see the documentation for request format.
3
You'll be billed $0.16/1M input, $0.48/1M output tokens. See full pricing.

API Portal Documentation Pricing Model Card

Code Examples

Install

pip install google-cloud-aiplatform

API key

GOOGLE_CLOUD_PROJECT

Model ID

llama2-13b-chat

For Google-published models use the model name directly, e.g. "gemini-2.0-flash-001". For third-party publishers (Anthropic, Meta, etc.) use the full publisher path, e.g. "publishers/anthropic/models/claude-3-5-sonnet-v2@20241022".

import os
import vertexai
from vertexai.generative_models import GenerativeModel

# Reads GOOGLE_CLOUD_PROJECT from env; authenticates via Application Default Credentials
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("llama2-13b-chat")
response = model.generate_content("Hello")
print(response.text)

About GCP Vertex AI

Google Cloud Vertex AI is a comprehensive machine learning platform that provides end-to-end solutions for developing, deploying, and managing AI models. The platform offers a unified interface that integrates various tools and services, enabling users to efficiently handle the entire machine learning lifecycle. Key features include AutoML capabilities for building custom models with minimal coding, a managed notebook environment for prototyping, and robust MLOps tools for model monitoring and versioning. Vertex AI supports both pre-trained models and custom training, making it versatile for a wide range of applications such as natural language processing, image recognition, and predictive analytics. The platform's design focuses on increasing productivity and accelerating time-to-market for AI solutions. By consolidating multiple AI tools into a single ecosystem, Vertex AI reduces manual effort and enhances collaboration among data scientists and engineers. Its scalable architecture allows organizations to efficiently manage large datasets and complex models, while the pay-as-you-go pricing model makes it accessible for businesses of all sizes. Additionally, Vertex AI's integration with popular open-source frameworks like TensorFlow and PyTorch enables users to leverage existing models and tools, fostering innovation and facilitating the development of customized AI applications tailored to specific business needs.

Vertex AI is Google Cloud's managed AI platform, offering access to Gemini models and hundreds of partner models alongside tools for fine-tuning, grounding, vector search, and end-to-end MLOps pipelines.

View all models on GCP Vertex AI →

Pricing on GCP Vertex AI

Type	Price (per 1M)
Input tokens	$0.16
Output tokens	$0.48

Capabilities

Structured Outputs

About Llama 2 13B Chat

The Llama 2 13B Chat model is a 13 billion parameter generative text model developed by Meta, optimized for conversational applications. Released on July 18, 2023, it's part of the Llama 2 family and excels in dialogue scenarios. The model leverages supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to generate coherent and contextually relevant responses. Trained on 2 trillion tokens from diverse public sources, it outperforms many open-source chat models and matches popular closed-source models in helpfulness and safety. This model is ideal for AI engineers working on chatbots, virtual assistants, and customer service automation. For more details, visit the model's Hugging Face page [1].

Full model details →

Model Specs

Released2023-07-18

Parameters13B

Context4k

ArchitectureDecoder Only

Knowledge cutoff2022-09

Google Cloud Platform (GCP)

Mountain View, California, United States