RWKV-7 Goose 0.1BRWKV-7 Goose 0.1B (approximately 190M parameters) is the smallest model in the RWKV-7 Goose series. Suitable for ultra-low-resource deployment with constant-memory inference. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.8 corpus. Apache 2.0 licensed.
2025-03-18
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-6 Finch 1.6BRWKV-6 Finch 1.6B is the smallest model in the RWKV-6 Finch series, ideal for lightweight deployments requiring constant-memory inference. Apache 2.0 licensed.
2024-04-09
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-7 Goose 0.4BRWKV-7 Goose 0.4B (approximately 450M parameters) is a lightweight model from the RWKV-7 Goose series. Designed for edge deployment and resource-constrained environments where constant-memory O(1) inference is critical. Uses RWKV-7 architecture with the Generalized Delta Rule. Trained on the World v2.9 corpus. Apache 2.0 licensed.
2025-03-18
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-6 Finch 3BRWKV-6 Finch 3B is a mid-range model in the RWKV-6 Finch series, offering a balance between capability and deployment efficiency. Apache 2.0 licensed. Constant-memory inference with no KV cache.
2024-04-09
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-7 Goose 1.5BRWKV-7 Goose 1.5B is a mid-range model from the RWKV-7 Goose World3 series. Uses the seventh-generation RWKV architecture with Generalized Delta Rule for expressive dynamic state evolution. Trained on 3.1 trillion tokens from the multilingual World v3 corpus. Constant-memory inference with no KV cache. Apache 2.0 licensed.
2025-03-18
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-6 Finch 7BRWKV-6 Finch 7B is a flagship mid-size model from the RWKV-6 architecture series. Introduced alongside the Eagle and Finch paper (arXiv 2404.05892, April 2024). The Finch 14B model was subsequently derived by stacking two Finch 7B weights. Uses multi-headed matrix-valued states for improved language comprehension. Constant-memory inference. Apache 2.0 licensed.
2024-04-09
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-7 Goose 2.9BRWKV-7 Goose 2.9B is the largest released model in the RWKV-7 Goose World3 series. Built on the seventh-generation RWKV architecture with the Generalized Delta Rule and dynamic state evolution, it achieves competitive benchmark performance against transformer models of equivalent scale. Trained on 3.1 trillion tokens from the World v3 multilingual corpus (100+ languages, BF16). As a pure recurrent architecture, it requires constant O(1) memory during inference (no KV cache) and processes sequences in linear O(n) time. Licensed Apache 2.0.
2025-03-18
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
RWKV-6 Finch 14BRWKV-6 Finch 14B is the largest model in the RWKV-6 Finch series, created by stacking two 7B Finch models. Released September 3, 2024. Achieves strong performance on MMLU (56.05%), ARC, HellaSwag (57.69%), and Winogrande (74.43%). The RWKV-6 architecture uses matrix-valued states and dynamic data-driven recurrence, improving comprehension and in-context reasoning compared to RWKV-5 (Eagle). Constant-memory O(1) inference with no KV cache. Apache 2.0 licensed.
2024-09-03
Researched 41d ago
Infinite
No fixed token cap
Infinite context
No tracked provider route
LTM-2-miniLTM-2-mini is Magic's research prototype supporting a 100 million token context window, announced August 29, 2024. Uses a novel sequence-dimension algorithm approximately 1,000× more memory-efficient than transformer attention at this scale — requiring only a fraction of a single H100's HBM versus 638 H100s for Llama 3.1 405B at the same context length. Not publicly released for API access or self-hosting; Magic stated they were separately training a full LTM-2 model. Specialization: coding/software development. Source: https://magic.dev/blog/100m-token-context-windows
2024-08-29
Researched 47d ago
100m context
No tracked provider route
Llama 4 Scout 17B-16E InstructMeta's Llama 4 Scout is a 17-billion parameter mixture-of-experts model with 16 expert routing. Optimized for efficient inference on edge and cloud environments with strong multi-turn conversation capabilities. Available on Cloudflare Workers AI.
2025-04-05
Researched 28d ago
10m contextVisionMultimodalJSON
LTM-1LTM-1 (Long-Term Memory 1) is Magic's first model with a 5 million token context window, announced June 6, 2023. Designed to process entire codebases in context for AI-assisted software development. Architecture and parameter count not publicly disclosed. Not available as a public API; Magic used it in an early-access coding product. Source: https://magic.dev/blog/ltm-1
2023-06-06
Researched 47d ago
5m context
No tracked provider route
MiniMax-01MiniMax-01 combines MiniMax-Text-01 and MiniMax-VL-01, pairing a 456B-total-parameter MoE language model with multimodal understanding for long-context text generation and vision-language tasks.
2025-01-14
Researched 13d ago
4m contextVisionMultimodal
Gemini 1.5 ProGemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.
2024-02-15
Researched 77d ago
2m contextJSON
Gemini 1.5 Pro 002Stable Gemini 1.5 Pro release (February variant) optimized for complex reasoning and high-quality multimodal analysis. Supports 2M context for extended document and video processing.
2024-09-24
Researched 47d ago
2m context
No tracked provider route
2024-08-27
Researched 47d ago
2m context
No tracked provider route
2024-08-01
Researched 47d ago
2m context
No tracked provider route
Grok 4.20 Multi-AgentGrok 4.20 Multi-Agent is the extended-context xAI API variant launched around March 10, 2026 as grok-4.20-multi-agent-0309. Its reasoning.effort parameter controls how many collaborating agents are used, and the variant carries a 2M token context window.
2026-03-10
Researched 46d ago
2m contextReasoningVisionMultimodalTool useFunctions
Gemini 3.5 ProGoogle's most capable Gemini 3.5 model, announced at Google I/O on May 19, 2026. Features a 2M-token context window, Deep Think reasoning mode, and frontier multimodal capabilities. In limited Vertex AI enterprise preview as of June 2026; general availability expected by end of June 2026 per Google CEO Sundar Pichai. Pricing and full API documentation pending GA release.
2026-05-19
Researched 9d ago
2m contextReasoningVisionMultimodal
No tracked provider route
GPT-5.4 ProPremium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.
2026-03-01
Researched 56d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
GPT-5.5 ProGPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.
2026-04-23
Researched 9d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
GPT-5.5GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.
2026-04-23
Researched 22d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
GPT-5.4GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
2026-03-05
Researched 22d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
Xiaomi MiMo-V2.5Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.
2026-04-22
Researched 40d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
Xiaomi MiMo-V2.5-ProXiaomi's April 22, 2026 public-beta flagship in the MiMo-V2.5 series. The official Xiaomi MiMo page describes MiMo-V2.5-Pro as its most capable model to date, focused on general agentic capability, complex software engineering, long-horizon tasks, and ultra-long-context instruction following. OpenRouter lists it as text-to-text with 1,048,576 token context, 131,072 max completion tokens, reasoning controls, tool use, and response_format support. Xiaomi says the V2.5 series will be open-sourced soon, but no public weights/license were verified at research time.
2026-04-22
Researched 40d ago
1.05m contextTool useFunctionsJSONPrompt cache
Nemotron 3 Super-120B-A12BNVIDIA Nemotron 3 Super-120B-A12B is a 120B total / 12B active hybrid Latent MoE model with interleaved Mamba-2 and MoE layers for agentic, reasoning, and conversational tasks. Fireworks lists the NVFP4 variant for on-demand deployment with 262k context.
2026-03-11
Researched 34d ago
1.05m contextJSONPrompt cache
2026-03-19
Researched 52d ago
1.05m context
No tracked provider route
Gemini 3.5 FlashGemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.
2026-05-19
Researched 23d ago
1.05m contextReasoningVisionMultimodalAudioTool use
Gemini 3.1 Flash-LiteGemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.
2026-05-07
Researched 16d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
Antigravity AgentAntigravity Agent is Google DeepMind's preview managed agent for autonomous coding and browsing workflows. Powered by Gemini 3.5 Flash, it plans, reasons, runs code, manages files, and browses the web inside a secure Google-hosted Linux sandbox through the Interactions API. It accepts text and image input, has a 1,048,576-token input context window that compacts at about 135K tokens, and supports a 65,536-token output limit. Environment compute is not billed during preview; Google describes pricing as pay-as-you-go based on underlying Gemini model tokens and tool use.
2026-05-19
Researched 41d ago
1.05m contextReasoningVisionMultimodalTool useCode exec
MiMo-V2-ProXiaomi MiMo-V2-Pro language model. The larger, higher-capability model in the MiMo V2 series with an extended 1M token context window.
2026-03-18
Researched 62d ago
1.05m contextPrompt cache
2025-10-01
Researched 68d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
Llama 3 70B Gradient 1048KLlama 3 70B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.
2024-04-18
Researched 47d ago
1.05m context
No tracked provider route
Llama 3 8B Gradient 1048KLlama 3 8B Gradient 1048K is Gradient's Gradient Llama 3 model. It offers a 1048K-token context window.
2024-04-18
Researched 47d ago
1.05m context
No tracked provider route
2024-04-18
Researched 47d ago
1.05m context
No tracked provider route
GPT-4.1OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.
2025-04-01
Researched 56d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
GPT-4.1 MiniFast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.
2025-04-01
Researched 56d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
GLM-5.2GLM-5.2 is Z.ai's coding-first successor to GLM-5.1 in the GLM-5 family, released June 13 2026. 753B parameters (40B active) in IndexShare MoE architecture; the IndexShare innovation reuses the same attention indexer across every four sparse layers, cutting per-token FLOPs by 2.9x at 1M context length. Trained on 28.5T tokens. Supports a 1M-token context window via the glm-5.2[1m] model ID, with 131,072-token maximum output and High/Max thinking-effort levels designed for extended agentic coding sessions. MIT license; open weights available on Hugging Face (zai-org/GLM-5.2 and zai-org/GLM-5.2-FP8). Self-reported HF card benchmarks: SWE-bench Pro 62.1, Terminal-Bench 2.1 82.7, MCP-Atlas 76.8, Tool-Decathlon 48.2, GPQA Diamond 91.2, AIME 2026 99.2, HLE 40.5. Available to GLM Coding Plan subscribers (Lite/Pro/Max/Team) directly, and via OpenRouter token API ($1.40/$4.40 per 1M tokens).
2026-06-13
Researched 11d ago
1m contextReasoningTool useFunctionsJSONCode exec
Amazon Nova PremierAmazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.
2025-03-17
Researched 11d ago
1m contextReasoningVisionMultimodalTool useFunctions
DeepSeek V4 ProDeepSeek V4 Pro is DeepSeek's flagship open-weights model, released April 24 2026 under the MIT license. Architecture: 1.6T total / 49B active parameters, MoE with Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) hybrid — requiring only 27% of inference FLOPs vs standard 1M-context transformers — plus Manifold-Constrained Hyper-Connections (mHC) and Muon Optimizer. Context window: 1,000,000 tokens; max output: 384,000 tokens (Think Max mode requires >=384K context). Text-only (no vision/image input). Supports three reasoning modes: Non-Think, Think High, Think Max. Function calling, tool use, and structured outputs supported. Key benchmarks: SWE-bench Verified 80.6%, SWE-bench Pro 55.4%, LiveCodeBench 93.5%, GPQA Diamond 90.1%, MMLU-Pro 87.5%, Terminal-Bench 2.0 59.1% on BenchLM's independent June 2026 harness, and Chatbot Arena 1456 (2026-06-16). Current API pricing: $0.435/$0.87 per 1M input/output tokens; DeepSeek made the former 75% promotional rate permanent in May 2026.
2026-04-24
Researched 11d ago
1m contextReasoningTool useFunctionsJSONPrompt cache
Qwen3.7-MaxAlibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.
2026-05-19
Researched 8d ago
1m contextReasoningTool useFunctionsJSONCode exec
Qwen3.6-PlusQwen3.6-Plus is Alibaba Cloud's GA Qwen3.6 flagship for long-context reasoning, coding, tool use, and multimodal workflows. DashScope lists it with a 1M-token context window, structured output support, and standard public token pricing.
2026-04-01
Researched 46d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 1.5 FlashGemini 1.5 Flash is a large language AI model by Google, crafted for speed and efficiency in high-volume scenarios 145. As a lightweight model, it's optimized for fast processing and cost-effectiveness, making it ideal for real-time applications and high-frequency tasks 567. With its multimodal capabilities, Gemini 1.5 Flash effectively processes and reasons across multiple data types, including text, images, audio, video, and PDFs 145. Despite its smaller size compared to Gemini 1.5 Pro, it excels in tasks like summarization, chat applications, and data extraction from lengthy documents, employing "knowledge distillation" to transfer essential knowledge from larger models 5. Additionally, it features an extensive context window of up to 1 million tokens, allowing it to manage large information volumes effectively 456.
2024-05-14
Researched 77d ago
1m contextJSON
Nemotron 3 UltraNVIDIA's open frontier-reasoning model (550B total / 55B active MoE, hybrid Transformer-Mamba). Highest Artificial Analysis Intelligence Index for any US open model (score: 48). 300+ tokens/second. 1M-token context. Announced at Computex 2026. Pricing: ~$0.60/$2.60 per 1M tokens (provider median); free tier on some providers.
2026-06-04
Researched 24d ago
1m contextReasoning
Gemini 1.5 Flash 8BLightweight 8B variant of Gemini 1.5 Flash optimized for speed and cost-efficiency. Supports 1M token context with fast inference for real-time applications.
2024-10-03
Researched 47d ago
1m context
Gemini 1.5 Flash on Google Vertex AIGemini 1.5 Flash on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.
2024-02-15
Researched 47d ago
1m contextVisionMultimodalJSON
2024-02-15
Researched 47d ago
1m contextVisionMultimodalJSON
Gemini 1.5 Pro on Google Vertex AIGemini 1.5 Pro on Google Vertex AI is Google DeepMind's Gemini 1.5 model with multimodal text and image input. It offers a 1M-token context window.
2024-02-15
Researched 47d ago
1m contextVisionMultimodalJSON
2024-02-15
Researched 47d ago
1m contextVisionMultimodalJSON
Gemini 1.0 UltraGoogle's Gemini 1.0 Ultra is a leading large language model designed for tackling highly complex tasks with advanced analytical capabilities. As the largest model in the Gemini 1.0 family, it excels in coding, mathematical reasoning, and multimodal reasoning. Its strength lies in its ability to seamlessly understand and process diverse data types, including text, code, audio, images, and video. Gemini Ultra surpasses human experts on the MMLU benchmark with a 90% score, although it has limitations in image generation and some multimodal tasks. The model features a 32,000-token context window, less than some competitors, and access is primarily through a paid subscription or via Google Cloud for developers.
2023-12-13
Researched 185d ago
1m context
2025-02-05
Researched 31d ago
1m contextVisionMultimodal
No tracked provider route
2025-02-05
Researched 31d ago
1m contextVisionMultimodalJSON
No tracked provider route
2024-12-11
Researched 185d ago
1m context
No tracked provider route
2024-11-19
Researched 47d ago
1m context
No tracked provider route
Gemini 1.5 Flash 002Stable Gemini 1.5 Flash release (February variant) optimized for high-speed processing and cost efficiency. Supports 1M context with fast token generation for real-time use.
2024-09-24
Researched 47d ago
1m context
No tracked provider route
2024-09-24
Researched 47d ago
1m context
No tracked provider route
2024-08-27
Researched 47d ago
1m context
No tracked provider route
2024-08-27
Researched 47d ago
1m context
No tracked provider route
Gemini 3 FlashGemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.
2025-12-17
Researched 49d ago
1m contextVisionMultimodalAudioTool useFunctions
Gemini 3 ProGoogle DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.
2025-12-11
Researched 185d ago
1m contextVisionMultimodalTool useFunctionsCode exec
Gemini 3 Flash PreviewFrontier-class performance rivaling larger models at a fraction of the cost. Most intelligent Gemini model built for speed, combining frontier intelligence with superior search and grounding. $0.50 input / $3.00 output per 1M tokens.
2025-12-17
Researched 77d ago
1m contextVisionMultimodalTool useFunctionsJSON
Amazon Nova 2 ProAmazon Nova 2 Pro is a preview text-generation model in the Amazon Nova 2 family, announced for Amazon Bedrock and made available to Amazon Nova Forge customers. It targets higher-capability general reasoning and generation workloads than Nova 2 Lite, supports a 1M-token context window, and currently has restricted preview access rather than standard public token pricing.
2025-12-02
Researched 11d ago
1m context
Amazon Nova 2 LiteAmazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.
2025-12-02
Researched 11d ago
1m contextReasoningVisionMultimodalTool useFunctions
Llama 4 Maverick 17B Instruct FP8Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.
2025-04-05
Researched 28d ago
1m contextVisionMultimodalJSONBatch
Qwen3.7-PlusAlibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.
2026-06-03
Researched 26d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Mythos PreviewAnthropic's cybersecurity-focused frontier model, offered as an invitation-only research preview under Project Glasswing. Succeeded by Claude Mythos 5 (API ID: claude-mythos-5) as of June 9, 2026. Anthropic has indicated that Claude Mythos Preview will be retired after Claude Mythos 5 becomes available; no formal retirement date was published as of 2026-06-09. For current access and the migration path, see the Anthropic migration guide.
2026-05-01
Researched 26d ago
1m contextReasoningVisionMultimodalTool useFunctions
Palmyra X5Palmyra X5 is Writer's most advanced model, purpose-built for enterprise AI agents. It delivers high capability at 1M token context for large-scale document processing and complex multi-step agent workflows.
2026-02-01
Researched 69d ago
1m contextTool useFunctionsJSON
Claude Sonnet 4.6Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
2026-02-17
Researched 23d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.7Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.
2026-04-16
Researched 8d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.6Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.
2026-02-05
Researched 47d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.8Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.
2026-05-28
Researched 11d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Sonnet 5Claude Sonnet 5 is Anthropic's next-generation Sonnet model for agentic coding, tool use, computer use, and professional work. It is a proprietary decoder-only model with a 1M-token context window, 128K max output, multimodal vision, adaptive thinking, function calling, structured outputs, prompt caching, and Batch API support. It is available through the Claude API, AWS Bedrock, Google Cloud Vertex AI, Microsoft Foundry preview, and OpenRouter. Anthropic lists durable standard pricing at $3/1M input and $15/1M output tokens, with introductory $2/$10 pricing through 2026-08-31.
2026-06-30
Researched 5d ago
1m contextReasoningVisionMultimodalTool useFunctions
DeepSeek V4 FlashDeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.
2026-04-24
Researched 2d ago
1m contextReasoningTool useFunctionsJSONPrompt cache
Gemini 2.5 FlashGoogle: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.
2025-06-17
Researched 77d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 3.1 Pro PreviewGoogle: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.
2026-02-19
Researched 16d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 2.5 Flash LiteGoogle: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.
2025-07-22
Researched 77d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 2.5 ProGoogle DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.
2025-06-17
Researched 30d ago
1m contextReasoningVisionMultimodalTool useFunctions
2025-09-01
Researched 77d ago
1m contextVisionMultimodalTool useFunctionsJSON
2026-01-01
Researched 31d ago
1m contextVisionMultimodalJSON
2025-05-06
Researched 31d ago
1m contextVisionMultimodalJSON
Gemini 3.1 Flash-LiteGA release of Google's most cost-efficient Gemini 3.1 model, optimized for speed, scale, and cost efficiency. Supersedes gemini-3.1-flash-lite-preview. API model ID: gemini-3.1-flash-lite. Pricing: $0.25/$1.50 per 1M tokens in/out.
2026-05-07
Researched 26d ago
1m contextVisionMultimodalTool useFunctionsJSON
No tracked provider route