Gemini 3.1 Flash-Lite
gemini-3.1-flash-lite
ProprietaryMultimodal
About
Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.
Gemini 3.1 Flash-Lite has a 1M-token context window.
Gemini 3.1 Flash-Lite input tokens at $0.25/1M, output at $1.5/1M.
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning
Providers(2)
Compare all →| Provider | Input (per 1M) | Output (per 1M) | Type | |
|---|---|---|---|---|
| Google AI Studio | $0.25 | $1.50 | Serverless | |
| OpenRouter | $0.25 | $1.50 | Serverless |
Benchmark Scores(2)
| Benchmark | Score | Version | Source |
|---|---|---|---|
| Google-Proof Q&A | 86.9 | diamond | https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/ |
| Chatbot Arena | 1432.0 | Arena Elo | https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/ |
API Versions
gemini-3.1-flash-liteRankings
Best LLMs for Code GenerationBest AI Agents & Agentic ModelsBest Multimodal / Vision LLMsBest LLMs for Reasoning & MathBest LLMs for Function Calling & Tool UseCheapest LLM APIsBest Long Context LLMsBest LLM APIsBest LLMs for EnterpriseBest LLMs for WritingBest LLMs for MarketingBest LLMs for Customer Support
Specifications
FamilyGemini 3.1
Released2026-05-07
Context1M
Max output66,000
ArchitectureDecoder Only
Knowledge cutoff2025-01
Specializationgeneral
LicenseProprietary
Trainingpretrained