Gemini 3.5 Flash on Vercel AI Gateway

Name: Gemini 3.5 Flash on Vercel AI Gateway
Brand: Google DeepMind
SKU: gemini-3.5-flash-vercel-ai-gateway
Price: 1.5 USD

Gemini 3.5 · Google DeepMind

Serverless

Last refreshed 2026-06-29. Next refresh: weekly.

Why use Gemini 3.5 Flash on Vercel AI Gateway?

Vercel AI Gateway offers Gemini 3.5 Flash with pay-as-you-go pricing at $1.50/1M input tokens. Vercel AI Gateway is a unified AI proxy providing a single OpenAI-compatible API endpoint to 275+ models from 25+ providers including Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, xAI, Alibaba, Amazon, ByteDance, Cohere, MiniMax, MoonshotAI, KwaiPilot, Black Forest Labs, Recraft, Voyage AI, NVIDIA, and more.

Compare Gemini 3.5 Flash across 4 providers to find the best fit for your use case

Input / 1M

$1.50

Output / 1M

$9.00

Cache

read $0.15

Batch

Not sourced

Setup recipe

Python + curl

Install

pip install openai

Auth

export AI_GATEWAY_API_KEY=...

Call

import os
from openai import OpenAI
client = OpenAI(
    api_key=os.environ["AI_GATEWAY_API_KEY"],

Model ID

google/gemini-3.5-flash

Request example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AI_GATEWAY_API_KEY"],
    base_url="https://ai-gateway.vercel.sh/v1"
)
response = client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Gotchas

Use provider model ID "google/gemini-3.5-flash", not the LLMReference slug "gemini-3.5-flash".
creator/model-name e.g. kwaipilot/kat-coder-pro-v2
The examples expect AI_GATEWAY_API_KEY; rename it only if your application config maps the new variable.

Compare Gemini 3.5 Flash Across Providers

Provider	Input (per 1M)	Output (per 1M)
Google AI Studio	$1.50	$9.00
GCP Vertex AI	$1.50	$9.00
Vercel AI Gateway	$1.50	$9.00
OpenRouter	$1.50	$9.00

Pricing

Type	Price (per 1M)
Input tokens	$1.50
Output tokens	$9.00
Query	$14.00

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudio

About Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.