Mistral Large 3 675B Instruct is MistralAI's Mistral Large model. It offers a 128K-token context window and scores 70.2 on τ-bench.
2025-12-01
Researched 8d ago
128k
128,000 tokens
Also known as: context caching, cache reads, cached prompts
reuse repeated prompt tokens
See matching models with benchmark scores and pricing.
116
matching active models
31
tracked providers
116
models with routes
Prompt caching lets a provider charge or execute repeated prompt prefixes differently when the same context is reused across requests. It matters for long system prompts, retrieval-heavy applications, and agent loops where stable instructions or documents are sent repeatedly.
Showing the first 80 matches, sorted by decision relevance, with tracked capability and provider-route evidence.
Mistral Large 3 675B Instruct is MistralAI's Mistral Large model. It offers a 128K-token context window and scores 70.2 on τ-bench.
2025-12-01
Researched 8d ago
128k
128,000 tokens
Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.
2026-04-22
Researched 32d ago
1.05m
1,048,576 tokens
Xiaomi's April 22, 2026 public-beta flagship in the MiMo-V2.5 series. The official Xiaomi MiMo page describes MiMo-V2.5-Pro as its most capable model to date, focused on general agentic capability, complex software engineering, long-horizon tasks, and ultra-long-context instruction following. OpenRouter lists it as text-to-text with 1,048,576 token context, 131,072 max completion tokens, reasoning controls, tool use, and response_format support. Xiaomi says the V2.5 series will be open-sourced soon, but no public weights/license were verified at research time.
2026-04-22
Researched 32d ago
1.05m
1,048,576 tokens
Kimi K2.6 is Moonshot AI's multimodal agentic coding model, released April 20 2026 under a Modified MIT license. Built on a 1-trillion-parameter MoE architecture (32B active, 384 experts with 8 selected per token plus 1 shared expert, 61 layers), it features a 262K context window and up to 65,536 output tokens. Supports native image and video inputs (screenshots, PDFs, spreadsheets). Designed for long-horizon coding with agent swarms of up to 300 sub-agents and 4,000 coordinated steps; Moonshot AI cites 200–300 sequential tool calls without task drift. Key benchmarks: SWE-bench Verified 80.2%, SWE-bench Pro 58.6%, LiveCodeBench v6 89.6%, GPQA Diamond 90.5%, Terminal-Bench 2.0 66.7%. Chatbot Arena Elo 1454 (2026-04-28 snapshot).
2026-04-20
Researched 2d ago
262k
262,144 tokens
Kimi K2.7-Code is Moonshot AI's coding-focused multimodal model released June 12, 2026, built on Kimi K2.6. Uses the same 1-trillion-parameter MoE architecture (32B active parameters, 384 experts with 8 selected per token, 61 layers) with a 262K context window and MoonViT vision encoder (400M parameters). Reports +21.8% on Moonshot's Kimi Code Bench v2, +11.0% on Program Bench, +31.5% on MLS Bench Lite versus K2.6, with approximately 30% fewer reasoning tokens. Forces thinking mode on by default and preserves reasoning content across multi-turn interactions for agentic use. Available via Kimi platform API and HuggingFace under Modified MIT license.
2026-06-12
Researched 2d ago
262k
262,144 tokens
Post-training variant of GLM-5 from Z.ai (Zhipu AI) with enhanced agentic coding capabilities. Released April 7, 2026. 754B parameters (40B active) in Mixture of Experts architecture, 200K token context, 128K max output. Supports autonomous plan–execute–test–fix–optimize loops for up to 8 hours without human intervention. Trained entirely on Huawei Ascend hardware (no Nvidia). Key benchmarks: SWE-bench Pro 58.4 (world #1 at release, surpassing GPT-5.4 57.7 and Claude Opus 4.6 57.3), GPQA Diamond 86.2, AIME 2026 95.3, Terminal-Bench 2.0 63.5, MCP-Atlas 71.8, Chatbot Arena Elo 1475 (June 16, 2026, arena.ai). Available via Z.ai API ($1.40/$4.40 per 1M input/output tokens) and open weights on Hugging Face under MIT license.
2026-04-07
Researched 3d ago
200k
200,000 tokens
Amazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.
2025-03-17
Researched 3d ago
1m
1,000,000 tokens
Claude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
2024-03-04
Researched 69d ago
200k
200,000 tokens
Kimi K2.5 is Moonshot AI's Kimi model focused on code generation and software engineering. It offers a 256K-token context window and scores 87.9 on GPQA.
2026-03-15
Researched 23d ago
256k
256,000 tokens
Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.
2026-02-11
Researched 69d ago
200k
200,000 tokens
NVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.
2026-03-11
Researched 26d ago
1.05m
1,048,576 tokens
DeepSeek V4 Pro is DeepSeek's flagship open-weights model, released April 24 2026 under the MIT license. Architecture: 1.6T total / 49B active parameters, MoE with Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) hybrid — requiring only 27% of inference FLOPs vs standard 1M-context transformers — plus Manifold-Constrained Hyper-Connections (mHC) and Muon Optimizer. Context window: 1,000,000 tokens; max output: 384,000 tokens (Think Max mode requires >=384K context). Text-only (no vision/image input). Supports three reasoning modes: Non-Think, Think High, Think Max. Function calling, tool use, and structured outputs supported. Key benchmarks: SWE-bench Verified 80.6%, SWE-bench Pro 55.4%, LiveCodeBench 93.5%, GPQA Diamond 90.1%, MMLU-Pro 87.5%, Terminal-Bench 2.0 59.1% on BenchLM's independent June 2026 harness, and Chatbot Arena 1456 (2026-06-16). Current API pricing: $0.435/$0.87 per 1M input/output tokens; DeepSeek made the former 75% promotional rate permanent in May 2026.
2026-04-24
Researched 3d ago
1m
1,000,000 tokens
SOLAR 10.7B is a robust large language model created by Upstage AI in South Korea, featuring 10.7 billion parameters. It is tailored for high efficiency and performance through its innovative "Depth Up-Scaling" (DUS) approach, which deepens the model's layers rather than widening them, allowing for enhanced capabilities without significantly increasing computational costs. This method distinguishes it from other models that utilize more complex techniques like Mixture of Experts. By integrating pre-trained weights from the Mistral 7B model with the Llama 2 framework, SOLAR 10.7B achieves notable performance, outpacing even some models with up to 30 billion parameters. Available under the Apache 2.0 license, it also includes a finely-tuned instruction-based variant under CC-BY-NC-4.0, optimized for single-turn conversations and diverse NLP tasks, albeit with limitations in handling multi-turn dialogue and complex context. The model is grounded in the transformer architecture, widely adopted in advanced language models.
2024-06-24
Researched 26d ago
4k
4,000 tokens
Alibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.
2026-05-19
Researched today
1m
1,000,000 tokens
Qwen3.6-Plus is Alibaba Cloud's GA Qwen3.6 flagship for long-context reasoning, coding, tool use, and multimodal workflows. DashScope lists it with a 1M-token context window, structured output support, and standard public token pricing.
2026-04-01
Researched 38d ago
1m
1,000,000 tokens
OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.
2025-04-16
Researched 19d ago
200k
200,000 tokens
Ring-2.6-1T is InclusionAI's MIT-licensed trillion-parameter MoE reasoning model for agent workflows, engineering tasks, scientific analysis, and enterprise automation. It supports high and xhigh reasoning effort modes and entered OpenRouter's Programming top 10 in the 2026-05-18 audit.
2026-05-08
Researched 39d ago
262k
262,144 tokens
OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.
2025-08-07
Researched 48d ago
400k
400,000 tokens
GPT-5 Chat is OpenAI's conversational variant of GPT-5 designed for advanced multimodal, context-aware enterprise conversations at 128K context.
2025-10-01
Researched 61d ago
128k
128,000 tokens
GPT-5.1 Chat is the fast, lightweight conversational member of the GPT-5.1 family, optimized for low-latency chat at 128K context.
2025-12-01
Researched 61d ago
128k
128,000 tokens
Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).
2025-08-07
Researched 48d ago
400k
400,000 tokens
OpenAI open-weight model with 120 billion parameters. Text-only model supporting reasoning, function calling, and structured outputs. Free for self-hosting. Released August 5, 2025.
2025-08-05
Researched 69d ago
131k
131,072 tokens
Gemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.
2025-12-17
Researched 41d ago
1m
1,000,000 tokens
Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.
2025-08-07
Researched 48d ago
400k
400,000 tokens
GPT OSS Safeguard 20B is OpenAI's GPT-OSS model focused on content moderation and safety classification. It offers a 128K-token context window with weights openly available for self-hosting.
2025-08-05
Researched 39d ago
131k
131,072 tokens
GPT-5.1-Codex is a coding-specialized version of GPT-5.1, optimized for software engineering and agentic coding workflows at 400K context.
2025-12-01
Researched 61d ago
400k
400,000 tokens
GPT-5 Codex is OpenAI's coding-specialized variant of GPT-5, optimized for software engineering workflows, code generation, and agentic coding tasks at 400K context.
2025-10-01
Researched 61d ago
400k
400,000 tokens
GLM-4.5V is a vision-language MoE model from Z.ai designed for multimodal agent applications, handling both image understanding and text generation at 64K context.
2026-01-01
Researched 36d ago
64k
64,000 tokens
Seed 1.6 is a general-purpose multimodal model from ByteDance Seed supporting text, image, and video inputs. It incorporates multimodal capabilities and deep thinking for complex tasks at 256K context.
2026-03-01
Researched 61d ago
256k
256,000 tokens
Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.
2025-12-02
Researched 3d ago
1m
1,000,000 tokens
o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex multi-step research tasks by synthesizing information from multiple sources at 200K context.
2025-10-10
Researched 44d ago
200k
200,000 tokens
OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.
2025-04-01
Researched 48d ago
1.05m
1,047,576 tokens
GLM-4.6V is Z.ai's large multimodal model for high-fidelity visual understanding and long-context reasoning across images, charts, and documents at 128K context.
2026-02-01
Researched 36d ago
128k
128,000 tokens
Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.
2025-04-01
Researched 48d ago
1.05m
1,047,576 tokens
KAT-Coder-Pro V2 is Kwaipilot's flagship agentic coding model, achieving 79.6% on SWE-Bench Verified (March 2026). Designed for complex enterprise software engineering tasks, multi-system coordination, and SaaS integration. Uses a 'Specialize-then-Unify' training paradigm with five specialized expert domains. Context: 256K tokens. Max output: 256K tokens (on Streamlake endpoint). Available via Vercel AI Gateway and OpenRouter.
2026-03-27
Researched 36d ago
256k
256,000 tokens
Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.
2024-10-22
Researched 26d ago
200k
200,000 tokens
Claude Sonnet 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 86 on MMLU PRO.
2025-09-29
Researched 39d ago
200k
200,000 tokens
Enhanced reasoning and grounded retrieval model from DeepSeek with multimodal text and image understanding.
2025-08-21
Researched 36d ago
64k
64,000 tokens
Claude Opus 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 80.7 on MMMU.
2025-11-01
Researched 39d ago
200k
200,000 tokens
Claude 3.5 Sonnet, the latest in Anthropic's line of large language models, merges state-of-the-art reasoning, coding, and natural language understanding capabilities with advanced multi-modal processing. Released in October 2024, it excels in benchmarks against previous models and competitors, thanks to its scalable attention mechanisms and massive neural network architecture. Its dynamic routing enables specialization in various tasks, supporting applications from software development and data analysis to customer support and content creation. Users benefit from its "Artifacts" feature for real-time collaborative workflows and can access the model through platforms like Claude.ai and APIs at competitive pricing rates.
2024-06-20
Researched 69d ago
200k
200,000 tokens
OpenAI GPT-4o: Flagship multimodal model with vision, function calling, and broad capability. $2.50/M input, $10/M output.
2024-05-13
Researched 48d ago
128k
128,000 tokens
Alibaba's Qwen3-Max, flagship model with improved multilingual and reasoning capabilities.
2025-04-28
Researched 69d ago
262k
262,144 tokens
Alibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.
2026-06-03
Researched 18d ago
1m
1,000,000 tokens
Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
2026-02-17
Researched 15d ago
1m
1,000,000 tokens
Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.
2026-04-16
Researched today
1m
1,000,000 tokens
Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.
2026-02-05
Researched 39d ago
1m
1,000,000 tokens
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse MoE architecture, available for preview as part of the Qwen3.6 series.
2026-04-20
Researched 46d ago
256k
256,000 tokens
Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.
2026-05-28
Researched 3d ago
1m
1,000,000 tokens
GLM-4.7 is Z.ai's flagship text model featuring enhanced programming capabilities and deeper reasoning at 200K context, succeeding GLM-4.6.
2026-03-01
Researched 36d ago
200k
200,000 tokens
DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.
2026-04-24
Researched 14d ago
1m
1,000,000 tokens
Step 3.7 Flash is StepFun's open-weights multimodal Mixture-of-Experts model for agentic coding, tool use, long-context reasoning, image understanding, and video understanding. It combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder, activates about 11B parameters per token, supports a 256K-token context window, and exposes low, medium, and high reasoning levels for speed/depth tradeoffs. StepFun reports leading open-model results on ClawEval-1.1, SimpleVQA with Search, and SWE-bench Pro at launch. Weights are available on Hugging Face under Apache 2.0.
2026-05-29
Researched 29d ago
256k
256,000 tokens
MAI-Code-1-Flash is Microsoft AI's lightweight agentic coding model built directly inside GitHub Copilot's production harness. It is designed for fast everyday developer workflows, adaptive thinking by task complexity, multi-turn instruction following, and token-efficient coding. Microsoft reports 51.2% on SWE-bench Pro versus 35.2% for Claude Haiku 4.5 in the same Copilot harness, plus stronger results on SWE-bench Verified, SWE-bench Multilingual, and Terminal-Bench 2.0 without publishing exact scores for those secondary benchmarks.
2026-06-02
Researched 16d ago
256k
256,000 tokens
DeepSeek V3.2 is DeepSeek's DeepSeek V3 model. It offers a 160K-token context window with weights openly available for self-hosting and scores 70 on SWE-bench Verified.
2025-12-01
Researched 38d ago
160k
160,000 tokens
DeepSeek R1 0528 is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 130K-token context window with weights openly available for self-hosting and scores 81 on GPQA.
2025-05-28
Researched 37d ago
130k
130,000 tokens
Extended thinking variant of Kimi K2 with native reasoning capabilities. 256K context.
2025-01-01
Researched 23d ago
256k
256,000 tokens
Qwen3-Coder-480B-A35B-Instruct is Alibaba's flagship open-source code generation and agentic model, released July 22, 2025 under the Apache 2.0 license. The model has 480 billion total parameters with 35 billion active parameters per token, organized across 62 transformer layers with 160 specialized expert networks and 8 experts activated per token. It uses Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads and supports a native context window of 262,144 tokens, extendable to 1 million tokens via YaRN position scaling. The model is purpose-built for software engineering tasks and agentic workflows: code generation, code review, test writing, multi-step debugging, and browser-based agentic task execution. On release, it achieved state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks, with performance comparable to Claude Sonnet 4 on these tasks. Available via Fireworks AI, Google Vertex AI, NVIDIA NIM, AWS Bedrock, Novita AI, and the Vercel AI Gateway.
2025-07-22
Researched 8d ago
262k
262,144 tokens
Google: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.
2026-02-19
Researched 8d ago
1m
1,000,000 tokens
MiniMax M2 is a language model from MiniMax. It offers a 197K-token context window.
2025-10-01
Researched 36d ago
197k
197,000 tokens
Google: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.
2025-06-17
Researched 69d ago
1m
1,000,000 tokens
MiniMax: MiniMax M2.5 (free) available via OpenRouter. Pricing: $null/1M input, $null/1M output.
2025-03-01
Researched 36d ago
197k
197,000 tokens
Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.
2026-05-19
Researched 15d ago
1.05m
1,048,576 tokens
MiniMax M2.7 is MiniMax's self-improving frontier model, released March 18, 2026. It introduces native multi-agent collaboration, complex skill orchestration, and early recursive self-improvement capabilities. The model uses 10B active parameters, supports a 204,800-token context window, and was released alongside MiniMax-M2.7-highspeed, a 66% faster latency-optimized variant. Public provider listings price standard M2.7 at $0.30 per 1M input tokens and $1.20 per 1M output tokens. MiniMax M3 supersedes it for million-token, multimodal, and computer-use workflows, while M2.7 remains the lower-cost text-only route when 200K context is enough.
2026-03-18
Researched 26d ago
205k
204,800 tokens
DeepSeek: DeepSeek V3.1 Terminus available via OpenRouter. Pricing: $0.21/1M input, $0.79/1M output.
2025-09-22
Researched 37d ago
164k
163,800 tokens
Google: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.
2025-07-22
Researched 69d ago
1m
1,000,000 tokens
Google DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.
2025-06-17
Researched 22d ago
1m
1,000,000 tokens
Google: Nano Banana (Gemini 2.5 Flash Image) available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.
2025-04-01
Researched 69d ago
33k
33,000 tokens
GLM-4.5 is Tsinghua Knowledge Engineering Group (THUDM)'s GLM-4 model. It offers a 128K-token context window.
2025-01-01
Researched 36d ago
128k
128,000 tokens
GLM-4.5-Air is Tsinghua Knowledge Engineering Group (THUDM)'s GLM-4 model. It offers a 128K-token context window.
2025-01-01
Researched 36d ago
128k
128,000 tokens
GLM-4.6 is Tsinghua Knowledge Engineering Group (THUDM)'s GLM-4 model. It offers a 198K-token context window.
2025-01-01
Researched 36d ago
198k
198,000 tokens
OpenAI: GPT-4o-mini available via OpenRouter. Pricing: $0.15/1M input, $0.6/1M output.
2024-07-18
Researched 48d ago
128k
128,000 tokens
Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.
2026-05-07
Researched 8d ago
1.05m
1,048,576 tokens
MiniMax M2.5 Highspeed is MiniMax's inference-optimized variant of M2.5, released simultaneously in February 2026. It delivers identical intelligence and outputs to standard M2.5 through a specialized inference engine at lower latency. The model supports a 204,800-token context window, 131,072-token max output, function calling, structured output, and reasoning. API model ID: MiniMax-M2.5-highspeed. It is designed for latency-sensitive interactive applications and automated agent pipelines.
2026-02-12
Researched 36d ago
205k
204,800 tokens
MiniMax M2.7 Highspeed is the inference-optimized variant of MiniMax M2.7, released simultaneously on March 18, 2026. It reaches 100 tokens per second output speed, about 66% faster than standard M2.7, while preserving identical intelligence and outputs through engine optimization rather than weight changes. It supports a 204,800-token context window, 131,072-token max output, function calling, structured output, and reasoning. API model ID: MiniMax-M2.7-highspeed.
2026-03-18
Researched 54d ago
205k
204,800 tokens
GPT-5.4 Nano is the smallest and fastest variant in the GPT-5.4 family, optimized for edge deployment and low-latency tasks. Model ID: gpt-5.4-nano.
2026-03-05
Researched 23d ago
400k
400,000 tokens
First native multimodal variant of GLM-5 with CogViT visual encoder. Specialized for design-to-code tasks—converts mockups, screenshots, Figma exports, and hand-drawn sketches into HTML, CSS, and JavaScript. Trained with reinforcement learning across 30+ task types with INT8 quantization. Achieved 94.8 on Design2Code benchmark (vs Claude Opus 4.6: 77.3). Supports image, video, and text inputs natively.
2026-04-01
Researched 69d ago
200k
200,000 tokens
GPT Image 1.5 is OpenAI's improved image generation model, released December 16, 2025. Delivers 4× faster generation than GPT Image 1, improved text rendering, precise logo and face preservation, and better instruction following. Approximately 20% cheaper than GPT Image 1 with per-image pricing at $0.009/$0.034/$0.133 for 1024×1024 (low/medium/high). Became the default for ChatGPT Images on release. Supports landscape (1536×1024) and portrait (1024×1536) orientations.
2025-12-16
Researched 48d ago
—
No window data
OpenAI's fast, low-cost GPT-5.6 model. Luna is the affordable tier in the Sol, Terra, and Luna lineup, optimized for latency-sensitive applications such as summarization, drafting, autocomplete, and routine automation. It is available only to select trusted partners until broad API access launches.
2026-06-26
Researched 1d ago
—
No window data
GPT-5.4 Mini is a smaller, cost-efficient variant of GPT-5.4 with a 400K token context window. Designed for tasks requiring long-context processing at lower cost. Model ID: gpt-5.4-mini.
2026-03-05
Researched 14d ago
400k
400,000 tokens
Purpose-built variant of GLM-5 optimized for agent orchestration and complex automated workflows. Features native agent-friendly training emphasizing tool use, command following, and persistent task execution. Designed for OpenClaw (Lobster Agent) workflows with ~0.67% tool-call error rate. Achieves 20% higher inference performance vs base GLM-5.
2026-03-01
Researched 69d ago
200k
200,000 tokens
GPT Image 1 Mini is OpenAI's cost-efficient image generation model, released at OpenAI DevDay 2025 on October 6, 2025. Approximately 80% cheaper than GPT Image 1 with per-image pricing at $0.005/$0.011/$0.036 (low/medium/high, 1024×1024). Targets high-volume, cost-sensitive API workflows. Supports the same quality tiers as the main GPT Image line.
2025-10-06
Researched 48d ago
—
No window data