LLM ReferenceLLM Reference

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite

ProprietaryMultimodal

About

Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.

Gemini 3.1 Flash-Lite has a 1M-token context window.

Gemini 3.1 Flash-Lite input tokens at $0.25/1M, output at $1.5/1M.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning

Providers(2)

Compare all →
ProviderInput (per 1M)Output (per 1M)Type
Google AI Studio$0.25$1.50Serverless
OpenRouter$0.25$1.50Serverless

Benchmark Scores(2)

BenchmarkScoreVersionSource
Google-Proof Q&A86.9diamondhttps://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/
Chatbot Arena1432.0Arena Elohttps://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/

API Versions

gemini-3.1-flash-lite

Rankings

Specifications

Released2026-05-07
Context1M
Max output66,000
ArchitectureDecoder Only
Knowledge cutoff2025-01
Specializationgeneral
LicenseProprietary
Trainingpretrained

Created by

Pioneering artificial intelligence research.

London, United Kingdom
Founded 2014
Website