LLM Reference
Concepts & capability filters
Capability filtercapabilitybeginner

Context window

Also known as: context length, context size, token window

See matching models with benchmark scores and pricing.

1,333

matching active models

69

tracked providers

745

models with routes

model.context

Definition

The context window is the maximum number of tokens a large language model can consider at once for input and output during inference, limiting the amount of information it can process in a single pass. Larger windows enable handling longer conversations or documents but increase computational demands.

Models With Context window

Showing the first 80 matches, sorted by decision relevance, with tracked capability and provider-route evidence.

1,333 matches
RWKV-7 Goose 0.1B

RWKV-7 Goose 0.1B (approximately 190M parameters) is the smallest model in the RWKV-7 Goose series. Suitable for ultra-low-resource deployment with constant-memory inference. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.8 corpus. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 1.6B

RWKV-6 Finch 1.6B is the smallest model in the RWKV-6 Finch series, ideal for lightweight deployments requiring constant-memory inference. Apache 2.0 licensed.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 0.4B

RWKV-7 Goose 0.4B (approximately 450M parameters) is a lightweight model from the RWKV-7 Goose series. Designed for edge deployment and resource-constrained environments where constant-memory O(1) inference is critical. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.9 corpus. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 3B

RWKV-6 Finch 3B is a mid-range model in the RWKV-6 Finch series, offering a balance between capability and deployment efficiency. Apache 2.0 licensed. Constant-memory inference with no KV cache.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 1.5B

RWKV-7 Goose 1.5B is a mid-range model from the RWKV-7 Goose World3 series. Uses the seventh-generation RWKV architecture with Generalized Delta Rule for expressive dynamic state evolution. Trained on 3.1 trillion tokens from the multilingual World v3 corpus. Constant-memory inference with no KV cache. Apache 2.0 licensed.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 7B

RWKV-6 Finch 7B is a flagship mid-size model from the RWKV-6 architecture series. Introduced alongside the Eagle and Finch paper (arXiv 2404.05892, April 2024). The Finch 14B model was subsequently derived by stacking two Finch 7B weights. Uses multi-headed matrix-valued states for improved language comprehension. Constant-memory inference. Apache 2.0 licensed.

2024-04-09

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-7 Goose 2.9B

RWKV-7 Goose 2.9B is the largest released model in the RWKV-7 Goose World3 series. Built on the seventh-generation RWKV architecture with the Generalized Delta Rule and dynamic state evolution, it achieves competitive benchmark performance against transformer models of equivalent scale. Trained on 3.1 trillion tokens from the World v3 multilingual corpus (100+ languages, BF16). As a pure recurrent architecture, it requires constant O(1) memory during inference (no KV cache) and processes sequences in linear O(n) time. Licensed Apache 2.0.

2025-03-18

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

RWKV-6 Finch 14B

RWKV-6 Finch 14B is the largest model in the RWKV-6 Finch series, created by stacking two 7B Finch models. Released September 3, 2024. Achieves strong performance on MMLU (56.05%), ARC, HellaSwag (57.69%), and Winogrande (74.43%). The RWKV-6 architecture uses matrix-valued states and dynamic data-driven recurrence, improving comprehension and in-context reasoning compared to RWKV-5 (Eagle). Constant-memory O(1) inference with no KV cache. Apache 2.0 licensed.

2024-09-03

Researched 41d ago

Infinite

No fixed token cap

Infinite context

No tracked provider route

LTM-2-mini

LTM-2-mini is Magic's research prototype supporting a 100 million token context window, announced August 29, 2024. Uses a novel sequence-dimension algorithm approximately 1,000× more memory-efficient than transformer attention at this scale — requiring only a fraction of a single H100's HBM versus 638 H100s for Llama 3.1 405B at the same context length. Not publicly released for API access or self-hosting; Magic stated they were separately training a full LTM-2 model. Specialization: coding/software development. Source: https://magic.dev/blog/100m-token-context-windows

2024-08-29

Researched 47d ago

100m

100,000,000 tokens

100m context

No tracked provider route

Llama 4 Scout 17B-16E Instruct

Meta's Llama 4 Scout is a 17-billion parameter mixture-of-experts model with 16 expert routing. Optimized for efficient inference on edge and cloud environments with strong multi-turn conversation capabilities. Available on Cloudflare Workers AI.

2025-04-05

Researched 28d ago

10m

10,000,000 tokens

10m contextVisionMultimodalJSON
AWS Bedrock

$0.170 in / $0.220 out / 1M tokens

12 routes

Provider docs
LTM-1

LTM-1 (Long-Term Memory 1) is Magic's first model with a 5 million token context window, announced June 6, 2023. Designed to process entire codebases in context for AI-assisted software development. Architecture and parameter count not publicly disclosed. Not available as a public API; Magic used it in an early-access coding product. Source: https://magic.dev/blog/ltm-1

2023-06-06

Researched 47d ago

5m

5,000,000 tokens

5m context

No tracked provider route

MiniMax-01

MiniMax-01 combines MiniMax-Text-01 and MiniMax-VL-01, pairing a 456B-total-parameter MoE language model with multimodal understanding for long-context text generation and vision-language tasks.

2025-01-14

Researched 13d ago

4m

4,000,000 tokens

4m contextVisionMultimodal
OpenRouter

$0.200 in / $1.10 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Pro

Gemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.

2024-02-15

Researched 77d ago

2m

2,000,000 tokens

2m contextJSON
GCP Vertex AI

$1.25 in / $5.00 out / 1M tokens

2 routes

Provider docs
Gemini 1.5 Pro 002

Stable Gemini 1.5 Pro release (February variant) optimized for complex reasoning and high-quality multimodal analysis. Supports 2M context for extended document and video processing.

2024-09-24

Researched 47d ago

2m

2,000,000 tokens

2m context

No tracked provider route

Gemini 1.5 Pro Experimental 0827

Updated Pro experimental variant with refinements to reasoning depth and creative task performance.

2024-08-27

Researched 47d ago

2m

2,000,000 tokens

2m context

No tracked provider route

Gemini 1.5 Pro Experimental 0801

Experimental Pro variant with enhanced reasoning and multimodal understanding for complex problem-solving tasks.

2024-08-01

Researched 47d ago

2m

2,000,000 tokens

2m context

No tracked provider route

Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is the extended-context xAI API variant launched around March 10, 2026 as grok-4.20-multi-agent-0309. Its reasoning.effort parameter controls how many collaborating agents are used, and the variant carries a 2M token context window.

2026-03-10

Researched 46d ago

2m

2,000,000 tokens

2m contextReasoningVisionMultimodalTool useFunctions
Vercel AI Gateway

$1.25 in / $2.50 out / 1M tokens

3 routes · 1 cache

Provider docs
Gemini 3.5 Pro

Google's most capable Gemini 3.5 model, announced at Google I/O on May 19, 2026. Features a 2M-token context window, Deep Think reasoning mode, and frontier multimodal capabilities. In limited Vertex AI enterprise preview as of June 2026; general availability expected by end of June 2026 per Google CEO Sundar Pichai. Pricing and full API documentation pending GA release.

2026-05-19

Researched 9d ago

2m

2,000,000 tokens

2m contextReasoningVisionMultimodal

No tracked provider route

GPT-5.4 Pro

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 56d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

Provider docs
GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.

2026-04-23

Researched 9d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

Provider docs
GPT-5.5

GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.

2026-04-23

Researched 22d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$5.00 in / $30.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-5.4

GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

2026-03-05

Researched 22d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$2.50 in / $15.00 out / 1M tokens

4 routes · 1 batch · 3 cache

Provider docs
Xiaomi MiMo-V2.5

Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.

2026-04-22

Researched 40d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$0.140 in / $0.280 out / 1M tokens

2 routes · 1 cache

Provider docs
Xiaomi MiMo-V2.5-Pro

Xiaomi's April 22, 2026 public-beta flagship in the MiMo-V2.5 series. The official Xiaomi MiMo page describes MiMo-V2.5-Pro as its most capable model to date, focused on general agentic capability, complex software engineering, long-horizon tasks, and ultra-long-context instruction following. OpenRouter lists it as text-to-text with 1,048,576 token context, 131,072 max completion tokens, reasoning controls, tool use, and response_format support. Xiaomi says the V2.5 series will be open-sourced soon, but no public weights/license were verified at research time.

2026-04-22

Researched 40d ago

1.05m

1,048,576 tokens

1.05m contextTool useFunctionsJSONPrompt cache
OpenRouter

$0.435 in / $0.870 out / 1M tokens

3 routes · 2 cache

Provider docs
Nemotron 3 Super-120B-A12B

NVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.

2026-03-11

Researched 34d ago

1.05m

1,048,576 tokens

1.05m contextJSONPrompt cache
OpenRouter

$0.090 in / $0.450 out / 1M tokens

6 routes · 1 cache

Provider docs
Nemotron-Cascade-2-30B-A3B

30B MoE model with 3B active parameters - superior reasoning with IMO/IOI 2025 gold-medal performance

2026-03-19

Researched 52d ago

1.05m

1,048,576 tokens

1.05m context

No tracked provider route

Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.

2026-05-19

Researched 23d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalAudioTool use
GCP Vertex AI

$1.50 in / $9.00 out / 1M tokens

4 routes · 2 batch · 3 cache

Provider docs
Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.

2026-05-07

Researched 16d ago

1.05m

1,048,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
Google AI Studio

$0.250 in / $1.50 out / 1M tokens

3 routes · 1 cache

Provider docs
Antigravity Agent

Antigravity Agent is Google DeepMind's preview managed agent for autonomous coding and browsing workflows. Powered by Gemini 3.5 Flash, it plans, reasons, runs code, manages files, and browses the web inside a secure Google-hosted Linux sandbox through the Interactions API. It accepts text and image input, has a 1,048,576-token input context window that compacts at about 135K tokens, and supports a 65,536-token output limit. Environment compute is not billed during preview; Google describes pricing as pay-as-you-go based on underlying Gemini model tokens and tool use.

2026-05-19

Researched 41d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalTool useCode exec
Google AI Studio

Pricing not tracked / 1M tokens

1 route

Provider docs
MiMo-V2-Pro

Xiaomi MiMo-V2-Pro language model. The larger, higher-capability model in the MiMo V2 series with an extended 1M token context window.

2026-03-18

Researched 62d ago

1.05m

1,048,576 tokens

1.05m contextPrompt cache
OpenRouter

$1.00 in / $3.00 out / 1M tokens

2 routes · 1 cache

Provider docs
Gemini 2.5 Pro Computer Use Preview

Specialized for browser control agents. $1.25/$10.00 (<=200K), $2.50/$15.00 (>200K). Available on AI Studio and Vertex AI; no free tier.

2025-10-01

Researched 68d ago

1.05m

1,048,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$1.25 in / $10.00 out / 1M tokens

2 routes

Provider docs
Llama 3 70B Gradient 1048K

Llama 3 70B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 47d ago

1.05m

1,048,000 tokens

1.05m context

No tracked provider route

Llama 3 8B Gradient 1048K

Llama 3 8B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 47d ago

1.05m

1,048,000 tokens

1.05m context

No tracked provider route

Llama 3.1 8B Gradient 1048K

Llama 3.1 8B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 47d ago

1.05m

1,048,000 tokens

1.05m context

No tracked provider route

GPT-4.1

OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.

2025-04-01

Researched 56d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.00 in / $8.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-4.1 Mini

Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.

2025-04-01

Researched 56d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$0.400 in / $1.60 out / 1M tokens

4 routes · 2 cache

Provider docs
GLM-5.2

GLM-5.2 is Z.ai's coding-first successor to GLM-5.1 in the GLM-5 family, released June 13 2026. 753B parameters (40B active) in IndexShare MoE architecture; the IndexShare innovation reuses the same attention indexer across every four sparse layers, cutting per-token FLOPs by 2.9x at 1M context length. Trained on 28.5T tokens. Supports a 1M-token context window via the glm-5.2[1m] model ID, with 131,072-token maximum output and High/Max thinking-effort levels designed for extended agentic coding sessions. MIT license; open weights available on Hugging Face (zai-org/GLM-5.2 and zai-org/GLM-5.2-FP8). Self-reported HF card benchmarks: SWE-bench Pro 62.1, Terminal-Bench 2.1 82.7, MCP-Atlas 76.8, Tool-Decathlon 48.2, GPQA Diamond 91.2, AIME 2026 99.2, HLE 40.5. Available to GLM Coding Plan subscribers (Lite/Pro/Max/Team) directly, and via OpenRouter token API ($1.40/$4.40 per 1M tokens).

2026-06-13

Researched 11d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONCode exec
OpenRouter

$1.40 in / $4.40 out / 1M tokens

1 route

Provider docs
Amazon Nova Premier

Amazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.

2025-03-17

Researched 11d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
AWS Bedrock

$2.50 in / $12.50 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
DeepSeek V4 Pro

DeepSeek V4 Pro is DeepSeek's flagship open-weights model, released April 24 2026 under the MIT license. Architecture: 1.6T total / 49B active parameters, MoE with Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) hybrid — requiring only 27% of inference FLOPs vs standard 1M-context transformers — plus Manifold-Constrained Hyper-Connections (mHC) and Muon Optimizer. Context window: 1,000,000 tokens; max output: 384,000 tokens (Think Max mode requires >=384K context). Text-only (no vision/image input). Supports three reasoning modes: Non-Think, Think High, Think Max. Function calling, tool use, and structured outputs supported. Key benchmarks: SWE-bench Verified 80.6%, SWE-bench Pro 55.4%, LiveCodeBench 93.5%, GPQA Diamond 90.1%, MMLU-Pro 87.5%, Terminal-Bench 2.0 59.1% on BenchLM's independent June 2026 harness, and Chatbot Arena 1456 (2026-06-16). Current API pricing: $0.435/$0.87 per 1M input/output tokens; DeepSeek made the former 75% promotional rate permanent in May 2026.

2026-04-24

Researched 11d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONPrompt cache
DeepSeek Platform

$0.435 in / $0.870 out / 1M tokens

5 routes · 3 cache

Provider docs
Qwen3.7-Max

Alibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.

2026-05-19

Researched 8d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONCode exec
Novita AI

$1.25 in / $3.75 out / 1M tokens

4 routes · 3 cache

Provider docs
Qwen3.6-Plus

Qwen3.6-Plus is Alibaba Cloud's GA Qwen3.6 flagship for long-context reasoning, coding, tool use, and multimodal workflows. DashScope lists it with a 1M-token context window, structured output support, and standard public token pricing.

2026-04-01

Researched 46d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
Alibaba Cloud PAI-EAS

$0.325 in / $1.95 out / 1M tokens

3 routes · 2 cache

Provider docs
Gemini 1.5 Flash

Gemini 1.5 Flash is a large language AI model by Google, crafted for speed and efficiency in high-volume scenarios 145. As a lightweight model, it's optimized for fast processing and cost-effectiveness, making it ideal for real-time applications and high-frequency tasks 567. With its multimodal capabilities, Gemini 1.5 Flash effectively processes and reasons across multiple data types, including text, images, audio, video, and PDFs 145. Despite its smaller size compared to Gemini 1.5 Pro, it excels in tasks like summarization, chat applications, and data extraction from lengthy documents, employing "knowledge distillation" to transfer essential knowledge from larger models 5. Additionally, it features an extensive context window of up to 1 million tokens, allowing it to manage large information volumes effectively 456.

2024-05-14

Researched 77d ago

1m

1,000,000 tokens

1m contextJSON
GCP Vertex AI

$0.075 in / $0.300 out / 1M tokens

2 routes

Provider docs
Nemotron 3 Ultra

NVIDIA's open frontier-reasoning model (550B total / 55B active MoE, hybrid Transformer-Mamba). Highest Artificial Analysis Intelligence Index for any US open model (score: 48). 300+ tokens/second. 1M-token context. Announced at Computex 2026. Pricing: ~$0.60/$2.60 per 1M tokens (provider median); free tier on some providers.

2026-06-04

Researched 24d ago

1m

1,000,000 tokens

1m contextReasoning
OpenRouter

$0.500 in / $2.20 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Flash 8B

Lightweight 8B variant of Gemini 1.5 Flash optimized for speed and cost-efficiency. Supports 1M token context with fast inference for real-time applications.

2024-10-03

Researched 47d ago

1m

1,000,000 tokens

1m context
GCP Vertex AI

$0.0375 in / $0.150 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Flash on Google Vertex AI

Gemini 1.5 Flash on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.

2024-02-15

Researched 47d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSON
GCP Vertex AI

$0.035 in / $0.105 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Pro on Google Vertex AI

Gemini 1.5 Pro on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.

2024-02-15

Researched 47d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSON
GCP Vertex AI

$0.125 in / $0.375 out / 1M tokens

1 route

Provider docs
Gemini 1.0 Ultra

Google's Gemini 1.0 Ultra is a leading large language model designed for tackling highly complex tasks with advanced analytical capabilities. As the largest model in the Gemini 1.0 family, it excels in coding, mathematical reasoning, and multimodal reasoning. Its strength lies in its ability to seamlessly understand and process diverse data types, including text, code, audio, images, and video. Gemini Ultra surpasses human experts on the MMLU benchmark with a 90% score, although it has limitations in image generation and some multimodal tasks. The model features a 32,000-token context window, less than some competitors, and access is primarily through a paid subscription or via Google Cloud for developers.

2023-12-13

Researched 185d ago

1m

1,000,000 tokens

1m context
GCP Vertex AI

$1.00 in / $3.00 out / 1M tokens

1 route

Provider docs
Gemini 2.0 Flash-Lite (Preview 02-05)

Gemini 2.0 Flash Lite Preview (02-05). Retiring June 1, 2026. Migrate to Gemini 2.5 or Gemini 3 series.

2025-02-05

Researched 31d ago

1m

1,000,000 tokens

1m contextVisionMultimodal

No tracked provider route

Gemini 2.0 Pro (Experimental 02-05)

Gemini 2.0 Pro (Experimental 02-05) is Google DeepMind's Gemini 2.0 model. Its knowledge cutoff is 2024-08.

2025-02-05

Researched 31d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSON

No tracked provider route

Gemini 2.0 Flash Experimental

Google Gemini 2.0 Flash experimental model with 1M context for long-form understanding.

2024-12-11

Researched 185d ago

1m

1,000,000 tokens

1m context

No tracked provider route

LearnLM 1.5 Pro Experimental

Google LearnLM experimental model optimized for educational and tutoring applications.

2024-11-19

Researched 47d ago

1m

1,000,000 tokens

1m context

No tracked provider route

Gemini 1.5 Flash 002

Stable Gemini 1.5 Flash release (February variant) optimized for high-speed processing and cost efficiency. Supports 1M context with fast token generation for real-time use.

2024-09-24

Researched 47d ago

1m

1,000,000 tokens

1m context

No tracked provider route

Gemini 1.5 Flash 8B Experimental 0924

Updated experimental 8B Flash with improvements to latency and multimodal understanding capabilities.

2024-09-24

Researched 47d ago

1m

1,000,000 tokens

1m context

No tracked provider route

Gemini 1.5 Flash 8B Experimental 0827

Experimental 8B Flash variant with optimizations for edge deployment and ultra-fast multimodal inference.

2024-08-27

Researched 47d ago

1m

1,000,000 tokens

1m context

No tracked provider route

Gemini 1.5 Flash Experimental 0827

Experimental Flash variant with enhancements to multimodal capabilities and inference speed.

2024-08-27

Researched 47d ago

1m

1,000,000 tokens

1m context

No tracked provider route

Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.

2025-12-17

Researched 49d ago

1m

1,000,000 tokens

1m contextVisionMultimodalAudioTool useFunctions
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

4 routes · 1 cache

Provider docs
Gemini 3 Pro

Google DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.

2025-12-11

Researched 185d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsCode exec
GCP Vertex AI

$1.25 in / $5.00 out / 1M tokens

2 routes

Provider docs
Gemini 3 Flash Preview

Frontier-class performance rivaling larger models at a fraction of the cost. Most intelligent Gemini model built for speed, combining frontier intelligence with superior search and grounding. $0.50 input / $3.00 output per 1M tokens.

2025-12-17

Researched 77d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

3 routes

Provider docs
Amazon Nova 2 Pro

Amazon Nova 2 Pro is a preview text-generation model in the Amazon Nova 2 family, announced for Amazon Bedrock and made available to Amazon Nova Forge customers. It targets higher-capability general reasoning and generation workloads than Nova 2 Lite, supports a 1M-token context window, and currently has restricted preview access rather than standard public token pricing.

2025-12-02

Researched 11d ago

1m

1,000,000 tokens

1m context
AWS Bedrock

Pricing not tracked / 1M tokens

1 route

Provider docs
Amazon Nova 2 Lite

Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.

2025-12-02

Researched 11d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Vercel AI Gateway

$0.300 in / $2.50 out / 1M tokens

1 route · 1 cache

Provider docs
Llama 4 Maverick 17B Instruct FP8

Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.

2025-04-05

Researched 28d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSONBatch
DeepInfra

$0.150 in / $0.600 out / 1M tokens

11 routes · 1 batch

Provider docs
Qwen3.7-Plus

Alibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.

2026-06-03

Researched 26d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$0.320 in / $1.28 out / 1M tokens

2 routes · 1 cache

Provider docs
Claude Mythos Preview

Anthropic's cybersecurity-focused frontier model, offered as an invitation-only research preview under Project Glasswing. Succeeded by Claude Mythos 5 (API ID: claude-mythos-5) as of June 9, 2026. Anthropic has indicated that Claude Mythos Preview will be retired after Claude Mythos 5 becomes available; no formal retirement date was published as of 2026-06-09. For current access and the migration path, see the Anthropic migration guide.

2026-05-01

Researched 26d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$25.00 in / $125.00 out / 1M tokens

2 routes

Provider docs
Palmyra X5

Palmyra X5 is Writer's most advanced model, purpose-built for enterprise AI agents. It delivers high capability at 1M token context for large-scale document processing and complex multi-step agent workflows.

2026-02-01

Researched 69d ago

1m

1,000,000 tokens

1m contextTool useFunctionsJSON
AWS Bedrock

$0.600 in / $6.00 out / 1M tokens

2 routes

Provider docs
Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 23d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched 8d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
Claude Opus 4.6

Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.

2026-02-05

Researched 47d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 4 cache

Provider docs
Claude Opus 4.8

Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.

2026-05-28

Researched 11d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 1 cache

Provider docs
Claude Sonnet 5

Claude Sonnet 5 is Anthropic's next-generation Sonnet model for agentic coding, tool use, computer use, and professional work. It is a proprietary decoder-only model with a 1M-token context window, 128K max output, multimodal vision, adaptive thinking, function calling, structured outputs, prompt caching, and Batch API support. It is available through the Claude API, AWS Bedrock, Google Cloud Vertex AI, Microsoft Foundry preview, and OpenRouter. Anthropic lists durable standard pricing at $3/1M input and $15/1M output tokens, with introductory $2/$10 pricing through 2026-08-31.

2026-06-30

Researched 5d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$2.00 in / $10.00 out / 1M tokens

5 routes · 2 batch · 3 cache

Provider docs
DeepSeek V4 Flash

DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.

2026-04-24

Researched 2d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONPrompt cache
OpenRouter

$0.090 in / $0.180 out / 1M tokens

5 routes · 4 cache

Provider docs
Gemini 2.5 Flash

Google: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.

2025-06-17

Researched 77d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.300 in / $2.50 out / 1M tokens

6 routes · 1 cache

Provider docs
Gemini 3.1 Pro Preview

Google: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.

2026-02-19

Researched 16d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$2.00 in / $12.00 out / 1M tokens

5 routes · 1 cache

Provider docs
Gemini 2.5 Flash Lite

Google: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.

2025-07-22

Researched 77d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.100 in / $0.400 out / 1M tokens

4 routes · 1 cache

Provider docs
Gemini 2.5 Pro

Google DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.

2025-06-17

Researched 30d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
GCP Vertex AI

$1.25 in / $10.00 out / 1M tokens

4 routes · 2 batch · 3 cache

Provider docs
Gemini 2.5 Pro Preview 05-06

Google: Gemini 2.5 Pro Preview 05-06 available via OpenRouter. Pricing: $1.25/1M input, $10/1M output.

2025-05-06

Researched 31d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSON
OpenRouter

$1.25 in / $10.00 out / 1M tokens

1 route

Provider docs
Gemini 3.1 Flash-Lite

GA release of Google's most cost-efficient Gemini 3.1 model, optimized for speed, scale, and cost efficiency. Supersedes gemini-3.1-flash-lite-preview. API model ID: gemini-3.1-flash-lite. Pricing: $0.25/$1.50 per 1M tokens in/out.

2026-05-07

Researched 26d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON

No tracked provider route