LLM Reference
Concepts & capability filters
Capability filtercapabilitybeginner

Context window

Also known as: context length, context size, token window

1,244

matching active models

55

tracked providers

662

models with routes

model.context

Definition

The context window is the maximum number of tokens a large language model can consider at once for input and output during inference, limiting the amount of information it can process in a single pass. Larger windows enable handling longer conversations or documents but increase computational demands.

Models With Context window

Showing the first 80 decision-sorted matches, with model flags and provider-route evidence from seed data.

1,244 matches
LTM-2-mini

LTM-2-mini is Magic's research prototype supporting a 100 million token context window, announced August 29, 2024. Uses a novel sequence-dimension algorithm approximately 1,000× more memory-efficient than transformer attention at this scale — requiring only a fraction of a single H100's HBM versus 638 H100s for Llama 3.1 405B at the same context length. Not publicly released for API access or self-hosting; Magic stated they were separately training a full LTM-2 model. Specialization: coding/software development. Source: https://magic.dev/blog/100m-token-context-windows

2024-08-29

Researched 2d ago

100M

100,000,000 tokens

100M context

No tracked provider route

Llama 4 Scout 17B

Multimodal Llama 4 with 16 active experts, supports 10M token context window for long-document processing

2025-10-01

Researched 32d ago

10M

10,000,000 tokens

10M contextMultimodalJSONBatch
AWS Bedrock

$0.170 in / $0.660 out / 1M tokens

1 route · 1 batch

Provider docs
Llama 4 Scout 17B Instruct

Llama 4 Scout 17B Instruct is Meta's Llama 4 model with multimodal text and image input. It scores 1295 on the Chatbot Arena benchmark.

2025-04-05

Researched 2d ago

10M

10,000,000 tokens

10M contextMultimodalJSONBatch
AWS Bedrock

$0.170 in / $0.660 out / 1M tokens

1 route · 1 batch

Provider docs
LTM-1

LTM-1 (Long-Term Memory 1) is Magic's first model with a 5 million token context window, announced June 6, 2023. Designed to process entire codebases in context for AI-assisted software development. Architecture and parameter count not publicly disclosed. Not available as a public API; Magic used it in an early-access coding product. Source: https://magic.dev/blog/ltm-1

2023-06-06

Researched 2d ago

5M

5,000,000 tokens

5M context

No tracked provider route

Gemini 1.5 Pro

Gemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.

2024-02-15

Researched 32d ago

2M

2,000,000 tokens

2M contextJSON
GCP Vertex AI

$1.25 in / $5.00 out / 1M tokens

2 routes

Provider docs
Gemini 1.5 Pro 002

Stable Gemini 1.5 Pro release (February variant) optimized for complex reasoning and high-quality multimodal analysis. Supports 2M context for extended document and video processing.

2024-09-24

Researched 2d ago

2M

2,000,000 tokens

2M context

No tracked provider route

Gemini 1.5 Pro Experimental 0827

Updated Pro experimental variant with refinements to reasoning depth and creative task performance.

2024-08-27

Researched 2d ago

2M

2,000,000 tokens

2M context

No tracked provider route

Gemini 1.5 Pro Experimental 0801

Experimental Pro variant with enhanced reasoning and multimodal understanding for complex problem-solving tasks.

2024-08-01

Researched 2d ago

2M

2,000,000 tokens

2M context

No tracked provider route

Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is the extended-context xAI API variant launched around March 10, 2026 as grok-4.20-multi-agent-0309. Its reasoning.effort parameter controls how many collaborating agents are used, and the variant carries a 2M token context window.

2026-03-10

Researched 1d ago

2M

2,000,000 tokens

2M contextReasoningVisionMultimodalTool useFunctions
xAI Console

$1.25 in / $2.50 out / 1M tokens

2 routes

Provider docs
GPT-5.4 Pro

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 11d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's premium variant of GPT-5.5, released April 23, 2026. Targets large quality gains for business, legal, education, and data science use cases. Scores 39.6% on FrontierMath Tier 4 (postdoctoral-level math problems), compared to 22.9% for Claude Opus 4.7. Priced at 6× the standard GPT-5.5 API rate. Available to ChatGPT subscribers and via API.

2026-04-23

Researched 2d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-5.5

GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimized for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 58.6% on SWE-Bench Pro. Individual factual claims are 23% more likely to be correct versus GPT-5.4, with factual errors 3% less frequent. Uses fewer tokens than GPT-5.4 for equivalent tasks. Supports text and image inputs. Available to ChatGPT Plus, Business, and Enterprise subscribers; API access coming soon. Model ID: gpt-5.5.

2026-04-23

Researched 2d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$5.00 in / $30.00 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5.4

GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

2026-03-05

Researched 11d ago

1.1M

1,050,000 tokens

1.1M contextReasoningMultimodalTool useFunctionsJSON
OpenAI API

$2.50 in / $15.00 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
Xiaomi MiMo-V2.5

Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.

2026-04-22

Researched 28d ago

1M

1,048,576 tokens

1M contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$0.400 in / $2.00 out / 1M tokens

1 route

Provider docs
Xiaomi MiMo-V2.5-Pro

Xiaomi's April 22, 2026 public-beta flagship in the MiMo-V2.5 series. The official Xiaomi MiMo page describes MiMo-V2.5-Pro as its most capable model to date, focused on general agentic capability, complex software engineering, long-horizon tasks, and ultra-long-context instruction following. OpenRouter lists it as text-to-text with 1,048,576 token context, 131,072 max completion tokens, reasoning controls, tool use, and response_format support. Xiaomi says the V2.5 series will be open-sourced soon, but no public weights/license were verified at research time.

2026-04-22

Researched 28d ago

1M

1,048,576 tokens

1M contextTool useFunctionsJSON
OpenRouter

$1.00 in / $3.00 out / 1M tokens

1 route

Provider docs
Nemotron 3 Super-120B-A12B

NVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.

2026-03-11

Researched 5d ago

1M

1,048,576 tokens

1M contextJSON
OpenRouter

$0.090 in / $0.450 out / 1M tokens

4 routes

Provider docs
Nemotron-Cascade-2-30B-A3B

30B MoE model with 3B active parameters - superior reasoning with IMO/IOI 2025 gold-medal performance

2026-03-19

Researched 7d ago

1M

1,048,576 tokens

1M context

No tracked provider route

Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.

2026-05-19

Researched 2d ago

1M

1,048,576 tokens

1M contextReasoningVisionMultimodalAudioTool use
GCP Vertex AI

$1.50 in / $9.00 out / 1M tokens

2 routes · 2 batch · 2 cache

Provider docs
Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.

2026-05-07

Researched 13d ago

1M

1,048,576 tokens

1M contextVisionMultimodalTool useFunctionsJSON
Google AI Studio

$0.250 in / $1.50 out / 1M tokens

2 routes

Provider docs
Gemini 2.5 Pro Computer Use Preview

Specialized for browser control agents. $1.25/$10.00 (<=200K), $2.50/$15.00 (>200K). Available on AI Studio and Vertex AI; no free tier.

2025-10-01

Researched 23d ago

1.048576M

1,048,576 tokens

1.048576M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$1.25 in / $10.00 out / 1M tokens

2 routes

Provider docs
MiMo-V2-Pro

Xiaomi MiMo-V2-Pro language model. The larger, higher-capability model in the MiMo V2 series with an extended 1M token context window.

2026-03-18

Researched 17d ago

1M

1,048,576 tokens

1M context
OpenRouter

$1.00 in / $3.00 out / 1M tokens

1 route

Provider docs
Llama 3 70B Gradient 1048K

Llama 3 70B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 2d ago

1048K

1,048,000 tokens

1048K context

No tracked provider route

Llama 3 8B Gradient 1048K

Llama 3 8B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 2d ago

1048K

1,048,000 tokens

1048K context

No tracked provider route

Llama 3.1 8B Gradient 1048K

Llama 3.1 8B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.

2024-04-18

Researched 2d ago

1048K

1,048,000 tokens

1048K context

No tracked provider route

GPT-4.1

OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.

2025-04-01

Researched 11d ago

1M

1,047,576 tokens

1M contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.00 in / $8.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
GPT-4.1 Mini

Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.

2025-04-01

Researched 11d ago

1M

1,047,576 tokens

1M contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$0.400 in / $1.60 out / 1M tokens

3 routes · 1 cache

Provider docs
Amazon Nova Premier

Amazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.

2025-03-17

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
AWS Bedrock

$2.50 in / $12.50 out / 1M tokens

2 routes

Provider docs
DeepSeek V4 Pro

DeepSeek V4 Pro is the flagship 1.6T parameter (49B activated) Mixture-of-Experts language model with 1M-token context. Features hybrid attention (CSA+HCA) requiring only 27% of inference FLOPs vs DeepSeek-V3.2 at 1M context, Manifold-Constrained Hyper-Connections (mHC), and Muon Optimizer for training stability. Achieves 93.5% on LiveCodeBench, 89.8% on IMOAnswerBench, and 90.1% on MMLU. Supports Non-Think, Think High, and Think Max reasoning modes. Pricing: $1.74/1M input, $3.48/1M output (cache hit: $0.145/1M input). MIT licensed. Pricing note: DeepSeek API docs state that deepseek-v4-pro is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.

2026-04-24

Researched 7d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSONPrompt cache
DeepSeek Platform

$0.435 in / $0.870 out / 1M tokens

3 routes · 1 cache

Provider docs
Qwen3.6-Plus

Qwen3.6-Plus is Alibaba Cloud's GA Qwen3.6 flagship for long-context reasoning, coding, tool use, and multimodal workflows. DashScope lists it with a 1M-token context window, structured output support, and standard public token pricing.

2026-04-01

Researched 1d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
Alibaba Cloud PAI-EAS

$0.325 in / $1.95 out / 1M tokens

2 routes · 1 cache

Provider docs
Gemini 1.5 Flash

Gemini 1.5 Flash is a large language AI model by Google, crafted for speed and efficiency in high-volume scenarios 145. As a lightweight model, it's optimized for fast processing and cost-effectiveness, making it ideal for real-time applications and high-frequency tasks 567. With its multimodal capabilities, Gemini 1.5 Flash effectively processes and reasons across multiple data types, including text, images, audio, video, and PDFs 145. Despite its smaller size compared to Gemini 1.5 Pro, it excels in tasks like summarization, chat applications, and data extraction from lengthy documents, employing "knowledge distillation" to transfer essential knowledge from larger models 5. Additionally, it features an extensive context window of up to 1 million tokens, allowing it to manage large information volumes effectively 456.

2024-05-14

Researched 32d ago

1M

1,000,000 tokens

1M contextJSON
GCP Vertex AI

$0.075 in / $0.300 out / 1M tokens

2 routes

Provider docs
Gemini 1.5 Flash 8B

Lightweight 8B variant of Gemini 1.5 Flash optimized for speed and cost-efficiency. Supports 1M token context with fast inference for real-time applications.

2024-10-03

Researched 2d ago

1M

1,000,000 tokens

1M context
GCP Vertex AI

$0.0375 in / $0.150 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Flash on Google Vertex AI

Gemini 1.5 Flash on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.

2024-02-15

Researched 2d ago

1M

1,000,000 tokens

1M contextVisionMultimodalJSON
GCP Vertex AI

$0.035 in / $0.105 out / 1M tokens

1 route

Provider docs
Gemini 1.5 Pro on Google Vertex AI

Gemini 1.5 Pro on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.

2024-02-15

Researched 2d ago

1M

1,000,000 tokens

1M contextVisionMultimodalJSON
GCP Vertex AI

$0.125 in / $0.375 out / 1M tokens

1 route

Provider docs
Gemini 1.0 Ultra

Google's Gemini 1.0 Ultra is a leading large language model designed for tackling highly complex tasks with advanced analytical capabilities. As the largest model in the Gemini 1.0 family, it excels in coding, mathematical reasoning, and multimodal reasoning. Its strength lies in its ability to seamlessly understand and process diverse data types, including text, code, audio, images, and video. Gemini Ultra surpasses human experts on the MMLU benchmark with a 90% score, although it has limitations in image generation and some multimodal tasks. The model features a 32,000-token context window, less than some competitors, and access is primarily through a paid subscription or via Google Cloud for developers.

2023-12-13

Researched 140d ago

1M

1,000,000 tokens

1M context
GCP Vertex AI

$1.00 in / $3.00 out / 1M tokens

1 route

Provider docs
MiniMax M1

MiniMax-M1 is a large-scale open-weight reasoning model from MiniMax with 456B total parameters and a 1M token context window, designed for extended reasoning and high-efficiency inference.

2025-09-01

Researched 24d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSON

No tracked provider route

Gemini 2.0 Flash-Lite (Preview 02-05)

Gemini 2.0 Flash Lite Preview (02-05). Retiring June 1, 2026. Migrate to Gemini 2.5 or Gemini 3 series.

2025-02-05

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 2.0 Pro (Experimental 02-05)

Gemini 2.0 Pro (Experimental 02-05) is Google DeepMind's Gemini 2.0 model. Its knowledge cutoff is 2024-08.

2025-02-05

Researched 2d ago

1M

1,000,000 tokens

1M contextJSON

No tracked provider route

Gemini 2.0 Flash Experimental

Google Gemini 2.0 Flash experimental model with 1M context for long-form understanding.

2024-12-11

Researched 140d ago

1M

1,000,000 tokens

1M context

No tracked provider route

LearnLM 1.5 Pro Experimental

Google LearnLM experimental model optimized for educational and tutoring applications.

2024-11-19

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 1.5 Flash 002

Stable Gemini 1.5 Flash release (February variant) optimized for high-speed processing and cost efficiency. Supports 1M context with fast token generation for real-time use.

2024-09-24

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 1.5 Flash 8B Experimental 0924

Updated experimental 8B Flash with improvements to latency and multimodal understanding capabilities.

2024-09-24

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 1.5 Flash 8B Experimental 0827

Experimental 8B Flash variant with optimizations for edge deployment and ultra-fast multimodal inference.

2024-08-27

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 1.5 Flash Experimental 0827

Experimental Flash variant with enhancements to multimodal capabilities and inference speed.

2024-08-27

Researched 2d ago

1M

1,000,000 tokens

1M context

No tracked provider route

Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.

2025-12-17

Researched 4d ago

1M

1,000,000 tokens

1M contextVisionMultimodalAudioTool useFunctions
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

3 routes

Provider docs
Gemini 3 Pro

Google DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.

2025-12-11

Researched 140d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsCode exec
GCP Vertex AI

$1.25 in / $5.00 out / 1M tokens

2 routes

Provider docs
Gemini 3 Flash Preview

Frontier-class performance rivaling larger models at a fraction of the cost. Most intelligent Gemini model built for speed, combining frontier intelligence with superior search and grounding. $0.50 input / $3.00 output per 1M tokens.

2025-12-17

Researched 32d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

3 routes

Provider docs
Amazon Nova 2 Lite

Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.

2026-03-01

Researched 24d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

No tracked provider route

Llama 4 Maverick 17B Instruct FP8

Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.

2025-04-05

Researched 32d ago

1M

1,000,000 tokens

1M contextJSON
DeepInfra

$0.150 in / $0.600 out / 1M tokens

7 routes

Provider docs
Claude Mythos Preview

Claude Mythos Preview is Anthropic's frontier research model, positioned above the public Claude 4 family and released exclusively via invitation-only Project Glasswing to roughly 12 launch partners and over 40 organizations working on critical infrastructure. No public API or self-serve access. Specializes in defensive cybersecurity — autonomously identified zero-day vulnerabilities including a 27-year-old OpenBSD TCP SACK remote code execution bug and a 17-year-old FreeBSD NFS RCE. Codenamed Capybara internally. Scores 93.9% on SWE-bench Verified, 82.0% on Terminal-Bench 2.0, and 97.6% on USAMO 2026. Partner pricing: $25/$125 per million tokens (input/output). Max output: 128K tokens. Knowledge cutoff: December 2025.

2026-04-07

Researched 20d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$25.00 in / $125.00 out / 1M tokens

2 routes

Provider docs
Palmyra X5

Palmyra X5 is Writer's most advanced model, purpose-built for enterprise AI agents. It delivers high capability at 1M token context for large-scale document processing and complex multi-step agent workflows.

2026-02-01

Researched 24d ago

1M

1,000,000 tokens

1M contextTool useFunctionsJSON

No tracked provider route

Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 13d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$3.00 in / $15.00 out / 1M tokens

5 routes · 1 batch · 2 cache

Provider docs
Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched 7d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
Claude Opus 4.6

Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.

2026-02-05

Researched 2d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
DeepSeek V4 Flash

DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.

2026-04-24

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSONPrompt cache
OpenRouter

$0.112 in / $0.224 out / 1M tokens

3 routes · 1 cache

Provider docs
Gemini 3.1 Pro Preview

Google: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.

2026-02-19

Researched 32d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$2.00 in / $12.00 out / 1M tokens

4 routes

Provider docs
Gemini 2.5 Flash

Google: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.

2025-06-17

Researched 32d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.300 in / $2.50 out / 1M tokens

4 routes

Provider docs
Gemini 2.5 Flash Lite

Google: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.

2025-07-22

Researched 32d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.100 in / $0.400 out / 1M tokens

3 routes

Provider docs
Gemini 2.5 Pro

Google: Gemini 2.5 Pro available via OpenRouter. Pricing: $1.25/1M input, $10/1M output.

2025-06-17

Researched 32d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$1.25 in / $10.00 out / 1M tokens

3 routes

Provider docs
Gemini 2.0 Flash Lite

Google: Gemini 2.0 Flash Lite available via OpenRouter. Pricing: $0.075/1M input, $0.3/1M output.

2025-02-12

Researched 32d ago

1M

1,000,000 tokens

1M contextJSON
OpenRouter

$0.075 in / $0.300 out / 1M tokens

1 route

Provider docs
SubQ 1M-Preview

SubQ 1M-Preview is Subquadratic's first large language model, built on a fully sub-quadratic sparse-attention architecture that scales compute linearly with context length (O(n) vs. traditional O(n²)). Supports a production context window of 1M tokens (architecture tested to 12M). Achieves 81.8% on SWE-Bench Verified, 95.0% on RULER @128K, and 65.9% on MRCR v2 (8-needle, 1M). Claims 50x faster and 50x cheaper than leading frontier models at 1M context length. Available via OpenAI-compatible API with streaming and tool use support. Model is proprietary and not open-source; fine-tuning for customer-specific use cases is mentioned as a future capability.

2026-05-05

Researched 2d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctions
SubQ API

Pricing not tracked / 1M tokens

1 route

Provider docs
Grok 4.3

xAI's Grok 4.3 is the current flagship API chat model for agentic tool calling and instruction following. xAI lists text and image input, text output, configurable reasoning, a 1,000,000 token context window, cached-input pricing, function calling, and structured outputs.

2026-05-06

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$1.25 in / $2.50 out / 1M tokens

3 routes · 2 cache

Provider docs
Qwen3.6-Flash

Qwen3.6-Flash is a native vision-language Flash model delivering a significant performance boost over Qwen3.5-Flash, with particular excellence in agentic coding capabilities and substantially improved spatial intelligence. Vision enhancements include notably better object localization and object detection.

2026-04-16

Researched 30d ago

1M

1,000,000 tokens

1M contextMultimodal
Alibaba Cloud PAI-EAS

$0.250 in / $1.50 out / 1M tokens

2 routes

Provider docs
Qwen3.5-Flash

Qwen3.5-Flash is a fast, cost-effective native vision-language model in the Qwen3.5 series, delivering outstanding performance comparable to the latest state-of-the-art models with significant leaps in both pure-text and multimodal capabilities compared to the Qwen3 series.

2026-02-23

Researched 30d ago

1M

1,000,000 tokens

1M contextMultimodal
OpenRouter

$0.070 in / $0.260 out / 1M tokens

2 routes

Provider docs
Grok 4.20

Grok 4.20 is xAI's February 2026 Grok 4-series model, first previewed under the informal Grok 4.2 beta label. Standard API variants launched around March 10, 2026 as grok-4.20-0309-reasoning and grok-4.20-0309-non-reasoning with a 1M context window.

2026-02-17

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$1.25 in / $2.50 out / 1M tokens

2 routes

Provider docs
Qwen3.5-Plus

Qwen3.5-Plus is the flagship commercial API model of the Qwen3.5 native vision-language series, delivering outstanding performance comparable to state-of-the-art models with significant leaps in both pure-text and multimodal capabilities compared to the Qwen3 series.

2026-02-15

Researched 30d ago

1M

1,000,000 tokens

1M contextMultimodal
Alibaba Cloud PAI-EAS

$0.400 in / $2.40 out / 1M tokens

2 routes

Provider docs
Gemini Deep Research Max Preview

Maximum-comprehensiveness version of Google's Deep Research agent, built on Gemini 3.1 Pro and released April 21, 2026. Spends more compute than the standard preview to consult more sources, refine reports, and capture nuanced details. Designed for accuracy-critical long-form investigations synthesizing information from hundreds of sources. Supports MCP servers, File Search, and multi-step planning. Context: 1M tokens; max output: 65,536 tokens. Runs at Gemini 3.1 Pro rates ($2.00/$12.00 per MTok). API ID: deep-research-max-preview-04-2026.

2026-04-21

Researched 19d ago

1M

1,000,000 tokens

1M contextVisionMultimodalAudioTool useFunctions
Google AI Studio

$2.00 in / $12.00 out / 1M tokens

1 route

Provider docs
Gemini Deep Research Preview

Google's agentic deep research model built on Gemini 3.1 Pro, released April 21, 2026. Designed for speed and efficiency in autonomous multi-step research: ingests text, images, PDFs, audio, and video to produce comprehensive cited reports from public web sources and private workspace data. Supports collaborative planning, visualization, MCP servers, and File Search. Context window: 1M tokens; max output: 65,536 tokens. Runs at Gemini 3.1 Pro rates ($2.00/$12.00 per MTok). API ID: deep-research-preview-04-2026.

2026-04-21

Researched 19d ago

1M

1,000,000 tokens

1M contextVisionMultimodalAudioTool useFunctions
Google AI Studio

$2.00 in / $12.00 out / 1M tokens

1 route

Provider docs
Grok 4.20 Non-Reasoning

Grok 4.20 Non-Reasoning is the xAI API non-reasoning variant launched around March 10, 2026 as grok-4.20-0309-non-reasoning. It is the live replacement target for retired non-reasoning fast models.

2026-03-10

Researched 1d ago

1M

1,000,000 tokens

1M contextVisionMultimodalTool useFunctionsJSON
xAI Console

$1.25 in / $2.50 out / 1M tokens

1 route

Provider docs
Grok 4.20 Reasoning

Grok 4.20 Reasoning is the xAI API reasoning variant launched around March 10, 2026 as grok-4.20-0309-reasoning. The prior May 2026 seed date was a placeholder; this model was already available months earlier and remains active.

2026-03-10

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
xAI Console

$1.25 in / $2.50 out / 1M tokens

1 route

Provider docs
Qwen-Plus-Character

Qwen-Plus-Character is the Plus-tier role-playing model in the Qwen series, optimized for anthropomorphic role-playing with advanced capabilities in following predefined character instructions, advancing conversations, and demonstrating active listening and empathy. It supports deep restoration of personalized characters and is dynamically updated.

2026-01-29

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

$0.500 in / $1.40 out / 1M tokens

1 route

Provider docs
Qwen-Flash-Character

Qwen-Flash-Character is the Flash-tier role-playing model from the Qwen series, optimized for multi-language anthropomorphic interaction with advanced character consistency, context-aware dialogue progression, and empathetic engagement. Features enhanced Japanese linguistic localization, human-like role-playing authenticity, and narrative coherence control.

2026-01-12

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

$0.050 in / $0.400 out / 1M tokens

1 route

Provider docs
Qwen-Plus

Qwen-Plus is an enhanced commercial API endpoint in the Qwen series, supporting Chinese, English, and multiple other languages. The backbone has been upgraded to the Qwen3 architecture, achieving effective integration of thinking and non-thinking modes with seamless switching during conversations.

2025-11-30

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

$1.20 in / $3.60 out / 1M tokens

1 route

Provider docs
Qwen3-Coder-Plus

Qwen3-Coder-Plus is a Qwen3-based code generation model with strong coding agent capabilities, excelling at tool invocation and environment interaction. It enables autonomous programming with outstanding code capability while retaining general-purpose reasoning.

2025-09-23

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

Pricing not tracked / 1M tokens

1 route

Provider docs
Qwen-Flash

Qwen-Flash is a Qwen3 series Flash model that seamlessly integrates thinking and non-thinking modes switchable mid-dialogue, excelling at complex thinking tasks with significant improvements in instruction adherence and text understanding. It supports 1M context length with tiered pricing based on context length.

2025-08-01

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

$0.250 in / $2.00 out / 1M tokens

1 route

Provider docs
Qwen3-Coder-Flash

Qwen3-Coder-Flash inherits the coding agent capabilities of Qwen3-Coder-Plus with support for multi-turn tool interaction, focused optimization on repository-level understanding, and enhanced tool-calling stability.

2025-07-29

Researched 30d ago

1M

1,000,000 tokens

1M context
Alibaba Cloud PAI-EAS

$1.60 in / $9.60 out / 1M tokens

1 route

Provider docs