Gemini 3.5 Flash

Name: Gemini 3.5 Flash
Author: Google DeepMind

Released

2026-05-19

Last refreshed

2026-06-29

Status

Researched 21d ago

ProprietaryCommercial use: conditionalMultimodalCodingRAGAgentsLong contextVisionJSON / Tool use

Gemini 3.5 Flash is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

Teams evaluating coding, rag, and agents
Workloads that can use a 1.05m context window
Buyers comparing 4 tracked provider routes

Do not use it for

Workloads where another current model has stronger sourced task evidence

Specifications

Family: Gemini 3.5
Released: 2026-05-19
Context: 1.05m
Max output: 65,536
Architecture: Decoder Only
Knowledge cutoff: 2025-01
Specialization: general
Openness: Proprietary
License: ProprietaryCommercial use: conditional
Weights: Not released
Code: Unknown
Training: Pretrained

Created by

Google DeepMind

Pioneering artificial intelligence research.

London, United Kingdom

Founded 2014

Website

Pricing

Output / 1M

$9.00

Input / 1M

$1.50

Cheapest of 4 routes · GCP Vertex AI · cache read $0.150

Providers(4)

Google AI Studio GCP Vertex AI Vercel AI Gateway OpenRouter

View 4 provider routes

Links

Website

About

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.

Gemini 3.5 Flash is a proprietary model in the Gemini 3.5 family. The structured metadata tracks a 1.05m-token context window, multimodal input, audio, reasoning, function calling, tool use, structured outputs, and code execution. This page tracks provider routes through Google AI Studio, GCP Vertex AI, Vercel AI Gateway, and 1 more, with the cheapest tracked route listed at $1.5 input and $9 output per 1M tokens. Headline tracked benchmarks include MMMU Pro 88.3, SWE-bench Pro 55.1, and Terminal-Bench 76.2.

Top use-case fit: coding, agents, and build tasks

Coding

Q/$ D

3 relevant benchmarks in the decision map.

RAG

Included by capability and metadata signals in the decision map.

Agents

Q/$ D

1 relevant benchmark in the decision map.

Provider price ladder

Compare all 4

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Batch in / out	Cache	Route
GCP Vertex AI	$1.50	$9.00	$0.750 / $4.50	read $0.150	Serverless
Google AI Studio	$1.50	$9.00	$0.750 / $4.50	read $0.150	Serverless
OpenRouter	$1.50	$9.00	-	-	Serverless
Vercel AI Gateway	$1.50	$9.00	-	read $0.150	Serverless

Available via routers & gateways(13)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSGoogle AI StudioGCP Vertex AI

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughGoogle AI StudioGCP Vertex AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionGoogle AI StudioGCP Vertex AI

AIRouter

Router

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Passthrough + feeGoogle AI StudioGCP Vertex AI

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionGoogle AI StudioGCP Vertex AI

Kong AI Gateway

Gateway

Multi-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.

SubscriptionGoogle AI StudioGCP Vertex AI

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudio

Benchmark peer barsfor Coding

SWE-bench ProRank 25 of 41

80.3

73.7

69.2

64.3

Gemini 3.5 Flashcurrent

55.1

SWE-bench VerifiedRank 24 of 80

Claude Fable 5

96.0

Claude Mythos Preview

93.9

Claude Opus 4.8

88.6

Claude Opus 4.7

87.6

Gemini 3.5 Flashcurrent

78.0

HumanEvalRank 12 of 97

Claude Sonnet 4.6

98.0

96.7

Claude Opus 4.6

95.0

Grok-3

94.5

Gemini 3.5 Flashcurrent

92.0

Benchmark scores(13)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Source
MMMU Pro	88.3	Vals.ai standardized CoT harness, 4-option, Pass@1, temp=0	https://www.vals.ai/benchmarks/mmmu
SWE-bench Pro	55.1	SWE-bench Pro Public (pass@1)	https://deepmind.google/models/model-cards/gemini-3-5-flash/
Terminal-Bench	76.2	—	https://deepmind.google/technologies/gemini/flash/
Massive Multi-discipline Multimodal Understanding	83.6	—	https://deepmind.google/technologies/gemini/flash/
HumanEval	92.0	—	https://o-mega.ai/articles/gemini-3-5-flash-benchmarks-cost-and-guide
ARC-AGI-2	72.1	ARC-AGI-2 (accuracy%)	https://benchlm.ai/benchmarks/arcAgi2
Google-Proof Q&A	92.2	GPQA Diamond (accuracy)	https://www.nxcode.io/resources/news/gemini-3-5-flash-complete-guide-benchmarks-pricing-api-2026
Humanity's Last Exam	40.2	HLE (accuracy)	https://deepmind.google/models/model-cards/gemini-3-5-flash/
MCP-Atlas	83.6	MCP-Atlas (accuracy%)	https://benchlm.ai/benchmarks/mcpAtlas
SWE-bench Verified	78.0	SWE-bench Verified (pass@1)	https://techjacksolutions.com/ai-brief/gemini-35-flash-launches-at-io-2026-what-googles-agentic-cod/
Terminal-Bench 2.0	76.2	Terminal-Bench 2.0 (accuracy%)	https://benchlm.ai/benchmarks/terminalBench2
CursorBench	49.8	CursorBench 3.1	https://cursor.com/evals
GeneBench-Pro	8.1	high	https://cdn.openai.com/pdf/21938268-21af-442f-af93-3b2249afb241/genebench-pro.pdf

Migration checks

No linked migration route is available for this model yet.

API versions

gemini-3.5-flash

Rankings & picks(6)

Best LLMs for RAGListed Best AI Agent Models 2026: SWE-bench RankedListed Best Multimodal / Vision LLMsListed Best LLMs for Reasoning & MathListed Best Long Context LLMsListed Best Mainstream LLM APIs, RankedListed

Compare Gemini 3.5 Flash with other models

Comparison and alternatives

Browse all comparisons →

Gemini 3.5 Flash vs Qwen3.7-Plus Gemini 3.5 Flash vs Composer 2.5 Gemini 3.5 Flash vs Gemini 3.5 Pro Gemini 3.5 Flash vs GPT-5.5 Gemini 3.5 Flash vs Claude Sonnet 4.6 Gemini 3.5 Flash vs Claude Opus 4.7 Gemini 3.5 Flash vs Gemini 2.5 Pro Gemini 3.5 Flash vs Gemini 2.5 Flash Gemini 3.5 Flash vs DeepSeek V4 Pro Gemini 3.5 Flash vs DeepSeek V4 Flash Gemini 3.5 Flash vs Grok 4 Gemini 3.5 Flash vs Claude Opus 4.6 Gemini 3.5 Flash vs Composer 2 Gemini 3.5 Flash vs MiniMax M3

Frequently asked questions

What is the context window of Gemini 3.5 Flash?

Gemini 3.5 Flash has a context window of 1.05m tokens.

What is the max output of Gemini 3.5 Flash?

Gemini 3.5 Flash can generate up to 65,536 output tokens.

How much does Gemini 3.5 Flash cost?

Gemini 3.5 Flash is available at $1.5/1M input tokens through Google AI Studio.

When was Gemini 3.5 Flash released?

Gemini 3.5 Flash was released on 2026-05-19.

Which providers offer Gemini 3.5 Flash?

Gemini 3.5 Flash is available from 4 providers: Google AI Studio, GCP Vertex AI, Vercel AI Gateway, OpenRouter.

What benchmarks has Gemini 3.5 Flash been tested on?

Gemini 3.5 Flash has been evaluated on 13 benchmarks, including MMMU Pro, SWE-bench Pro, Terminal-Bench, Massive Multi-discipline Multimodal Understanding, HumanEval.

Created by

Google DeepMind

Pioneering artificial intelligence research.

London, United Kingdom

Founded 2014

Website

Pricing

Output / 1M

$9.00

Input / 1M

$1.50

Cheapest of 4 routes · GCP Vertex AI · cache read $0.150

Providers(4)

Google AI Studio GCP Vertex AI Vercel AI Gateway OpenRouter

View 4 provider routes

Links

Website