GCP Vertex AI
Researched todayHyperscalerTier 1Google Cloud Platform (GCP)
GCP Vertex AI exposes 126 tracked models (97 with output token pricing in seed data). Task coverage across this catalog includes coding, rag, and agents; open any model detail page for benchmarks, batch tiers, and migration prompts.
Portfolio context: 7 decision-task tags, 126 catalog rows, latest research stamp 2026-06-15.
Use it for
- Teams comparing token and batch economics on this surface
- Operators routing coding, rag, and agents workloads through this API
- Batch buyers auditing discount coverage model-by-model
Do not use it for
- Final benchmark picks without opening the relevant model detail page
Catalog rows
126
Models linked to this provider in seed data
Priced output routes
97
Rows with token_out in seed data
Cheapest output
$0.080
Gemma 3 4B IT on this route
Batch-ready SKUs
3
Models with batch columns populated
Latest catalog ship
2026-06-09
6d since dated release field
Freshness
2026-06-15
Researched today
Routes available via routers & gateways
These routers list GCP Vertex AI as a target provider, so they can sit in front of this catalog for fallback, routing, or unified API access.
AIRouter
RouterCommercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.
Helicone
GatewayObservability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.
Kong AI Gateway
GatewayMulti-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.
LiteLLM
GatewayOpen-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.
Martian
RouterAI-powered LLM router that analyzes each prompt in real-time to select the optimal model, targeting 20–97% cost reduction while maintaining quality; San Francisco startup reportedly nearing $1.3B valuation.
Neutrino AI
RouterCommercial LLM router that dynamically routes each query to the best-suited model with load balancing and fallback handling, charging 3% of underlying AI spend.
Information
Vertex AI is Google Cloud's managed AI platform, offering access to Gemini models and hundreds of partner models alongside tools for fine-tuning, grounding, vector search, and end-to-end MLOps pipelines.
Catalog release signal
Latest ISO-dated model.release in this catalog is 2026-06-09 (6d ago).
Where this host wins
- Coding: 40 tracked models with SWE-bench / HumanEval-style scores.
- RAG: 51 tracked models with ruler / needle retrieval benchmarks.
- Agentic: 47 tracked models with BFCL, tau-bench, and SWE-bench tool-use coverage.
- Long-context: 54 tracked models with context-token or InfiniteBench-class signal.
Getting started
Official entry points from seed metadata — confirm quotas and regions in vendor docs.
Compliance notes (verbatim seed excerpts)
Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.
Platform Overview
Google Cloud Vertex AI is a comprehensive machine learning platform that provides end-to-end solutions for developing, deploying, and managing AI models. The platform offers a unified interface that integrates various tools and services, enabling users to efficiently handle the entire machine learning lifecycle. Key features include AutoML capabilities for building custom models with minimal coding, a managed notebook environment for prototyping, and robust MLOps tools for model monitoring and versioning. Vertex AI supports both pre-trained models and custom training, making it versatile for a wide range of applications such as natural language processing, image recognition, and predictive analytics. The platform's design focuses on increasing productivity and accelerating time-to-market for AI solutions. By consolidating multiple AI tools into a single ecosystem, Vertex AI reduces manual effort and enhances collaboration among data scientists and engineers. Its scalable architecture allows organizations to efficiently manage large datasets and complex models, while the pay-as-you-go pricing model makes it accessible for businesses of all sizes. Additionally, Vertex AI's integration with popular open-source frameworks like TensorFlow and PyTorch enables users to leverage existing models and tools, fostering innovation and facilitating the development of customized AI applications tailored to specific business needs.
Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.
Available Models(126)
View all →All models available as Serverless
| Model | Input (per 1M) | Output (per 1M) | Batch input (per 1M) | Batch output (per 1M) |
|---|---|---|---|---|
| Claude Fable 5 | $10 | $50 | — | — |
| Claude Mythos 5 | — | — | ||
| Claude Opus 4.8 | $5 | $25 | — | — |
| Gemini 3.5 Flash | $1.5 | $9 | $0.75(-50%) | $4.5(-50%) |
| Claude Opus 4.7 | $5 | $25 | — | — |
| Gemma 4 26B A4B IT | $0.15 | $0.60 | — | — |
| Gemma 4 31B IT | $0.15 | $0.60 | — | — |
| Gemma 4 E2B | $0 | $0 | — | — |
| Gemma 4 E2B IT | $0 | $0 | — | — |
| Gemma 4 E4B | $0 | $0 | — | — |