Using Llama 3 8B Instruct on GCP Vertex AI

Implementation guide · Llama 3 · AI at Meta

ServerlessOpen Weights

Quick Start

1
Create an account at GCP Vertex AI and generate an API key.
2
Use the GCP Vertex AI SDK or REST API to call llama3-8b-instruct — see the documentation for request format.
3
You'll be billed $0.12/1M input, $0.36/1M output tokens. See full pricing.

API Portal Documentation Pricing Model Card

Code Examples

Install

pip install google-cloud-aiplatform

API key

GOOGLE_CLOUD_PROJECT

Model ID

llama3-8b-instruct

For Google-published models use the model name directly, e.g. "gemini-2.0-flash-001". For third-party publishers (Anthropic, Meta, etc.) use the full publisher path, e.g. "publishers/anthropic/models/claude-3-5-sonnet-v2@20241022".

import os
import vertexai
from vertexai.generative_models import GenerativeModel

# Reads GOOGLE_CLOUD_PROJECT from env; authenticates via Application Default Credentials
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("llama3-8b-instruct")
response = model.generate_content("Hello")
print(response.text)

About GCP Vertex AI

Google Cloud Vertex AI is a comprehensive machine learning platform that provides end-to-end solutions for developing, deploying, and managing AI models. The platform offers a unified interface that integrates various tools and services, enabling users to efficiently handle the entire machine learning lifecycle. Key features include AutoML capabilities for building custom models with minimal coding, a managed notebook environment for prototyping, and robust MLOps tools for model monitoring and versioning. Vertex AI supports both pre-trained models and custom training, making it versatile for a wide range of applications such as natural language processing, image recognition, and predictive analytics. The platform's design focuses on increasing productivity and accelerating time-to-market for AI solutions. By consolidating multiple AI tools into a single ecosystem, Vertex AI reduces manual effort and enhances collaboration among data scientists and engineers. Its scalable architecture allows organizations to efficiently manage large datasets and complex models, while the pay-as-you-go pricing model makes it accessible for businesses of all sizes. Additionally, Vertex AI's integration with popular open-source frameworks like TensorFlow and PyTorch enables users to leverage existing models and tools, fostering innovation and facilitating the development of customized AI applications tailored to specific business needs.

Vertex AI is Google Cloud's managed AI platform, offering access to Gemini models and hundreds of partner models alongside tools for fine-tuning, grounding, vector search, and end-to-end MLOps pipelines.

View all models on GCP Vertex AI →

Pricing on GCP Vertex AI

Type	Price (per 1M)
Input tokens	$0.12
Output tokens	$0.36

Capabilities

Structured Outputs

About Llama 3 8B Instruct

The Llama 3 8B Instruct model, released on April 18, 2024, is Meta's latest instruction-following language model with 8 billion parameters. It utilizes an auto-regressive transformer architecture with Grouped-Query Attention for improved scalability. Trained on over 15 trillion tokens and fine-tuned with 10 million human-annotated examples, it excels in dialogue and conversational tasks. The model outperforms its predecessors on industry benchmarks, scoring 68.4 on MMLU (5-shot). Designed for commercial and research applications, it prioritizes safety and helpfulness, making it suitable for chatbots, virtual assistants, and other interactive AI applications. For more details, visit the Hugging Face page [1].

Full model details →

Model Specs

Released2024-04-18

Parameters8B

Context8k

ArchitectureDecoder Only

Knowledge cutoff2023-03

Google Cloud Platform (GCP)

Mountain View, California, United States