llmreference
GCP Vertex AI

Using Vicuna 7B 16K on GCP Vertex AI

Implementation guide · Vicuna · LMSYS Org

Serverless

Quick Start

  1. 1
    Create an account at GCP Vertex AI and generate an API key.
  2. 2
    Use the GCP Vertex AI SDK or REST API to call vicuna-7b-16k — see the documentation for request format.

Code Examples

Install
pip install google-cloud-aiplatform
API key
GOOGLE_CLOUD_PROJECT
Model ID
vicuna-7b-16k

For Google-published models use the model name directly, e.g. "gemini-2.0-flash-001". For third-party publishers (Anthropic, Meta, etc.) use the full publisher path, e.g. "publishers/anthropic/models/claude-3-5-sonnet-v2@20241022".

import os
import vertexai
from vertexai.generative_models import GenerativeModel

# Reads GOOGLE_CLOUD_PROJECT from env; authenticates via Application Default Credentials
vertexai.init(project=os.environ["GOOGLE_CLOUD_PROJECT"], location="us-central1")
model = GenerativeModel("vicuna-7b-16k")
response = model.generate_content("Hello")
print(response.text)

About GCP Vertex AI

Google Cloud Vertex AI is a comprehensive machine learning platform that provides end-to-end solutions for developing, deploying, and managing AI models. The platform offers a unified interface that integrates various tools and services, enabling users to efficiently handle the entire machine learning lifecycle. Key features include AutoML capabilities for building custom models with minimal coding, a managed notebook environment for prototyping, and robust MLOps tools for model monitoring and versioning. Vertex AI supports both pre-trained models and custom training, making it versatile for a wide range of applications such as natural language processing, image recognition, and predictive analytics. The platform's design focuses on increasing productivity and accelerating time-to-market for AI solutions. By consolidating multiple AI tools into a single ecosystem, Vertex AI reduces manual effort and enhances collaboration among data scientists and engineers. Its scalable architecture allows organizations to efficiently manage large datasets and complex models, while the pay-as-you-go pricing model makes it accessible for businesses of all sizes. Additionally, Vertex AI's integration with popular open-source frameworks like TensorFlow and PyTorch enables users to leverage existing models and tools, fostering innovation and facilitating the development of customized AI applications tailored to specific business needs.

Vertex AI is Google Cloud's managed AI platform, offering access to Gemini models and hundreds of partner models alongside tools for fine-tuning, grounding, vector search, and end-to-end MLOps pipelines.

Pricing on GCP Vertex AI

Capabilities

Structured Outputs

About Vicuna 7B 16K

Vicuna-7B-v1.5-16k is a large language model (LLM) designed as an advanced chat assistant, developed by LMSYS. It's built on a transformer architecture and fine-tuned from Llama 2, with a notable feature being its 16k context window achieved using linear RoPE scaling. This allows the model to process much longer sequences of text, making it highly effective for comprehensive conversations. Trained on approximately 125,000 conversations from ShareGPT.com, Vicuna demonstrates strong capabilities in handling open-ended dialogues, responding to questions, and supporting various natural language tasks. Despite its strengths, it shares common limitations with other LLMs, like potential biases and performance variability across tasks and languages. Its inference speed and computational requirements are significant due to its 7-billion parameter size. The model is available under the Llama 2 Community License Agreement with various quantized versions for optimized performance.

Model Specs

Released2023-10-23
Parameters7B
Context16K
ArchitectureDecoder Only
Knowledge cutoff2022

Provider

GCP Vertex AI
GCP Vertex AI

Google Cloud Platform (GCP)

Mountain View, California, United States