GPT-4 Vision PreviewGPT-4 Vision Preview is OpenAI's GPT-4 model with multimodal text and image input. It is deprecated (originally released 2023-11-06); use it only for reproducing earlier results or evaluating drift over time.
2023-11-06
Researched 41d ago
128k contextVisionMultimodalCode exec
No tracked provider route
GLM-5.2GLM-5.2 is Z.ai's coding-first successor to GLM-5.1 in the GLM-5 family, released June 13 2026. 753B parameters (40B active) in IndexShare MoE architecture; the IndexShare innovation reuses the same attention indexer across every four sparse layers, cutting per-token FLOPs by 2.9x at 1M context length. Trained on 28.5T tokens. Supports a 1M-token context window via the glm-5.2[1m] model ID, with 131,072-token maximum output and High/Max thinking-effort levels designed for extended agentic coding sessions. MIT license; open weights available on Hugging Face (zai-org/GLM-5.2 and zai-org/GLM-5.2-FP8). Self-reported HF card benchmarks: SWE-bench Pro 62.1, Terminal-Bench 2.1 82.7, MCP-Atlas 76.8, Tool-Decathlon 48.2, GPQA Diamond 91.2, AIME 2026 99.2, HLE 40.5. Available to GLM Coding Plan subscribers (Lite/Pro/Max/Team) directly, and via OpenRouter token API ($1.40/$4.40 per 1M tokens).
2026-06-13
Researched 5d ago
1m contextReasoningTool useFunctionsJSONCode exec
GLM-5.1Post-training variant of GLM-5 from Z.ai (Zhipu AI) with enhanced agentic coding capabilities. Released April 7, 2026. 754B parameters (40B active) in Mixture of Experts architecture, 200K token context, 128K max output. Supports autonomous plan–execute–test–fix–optimize loops for up to 8 hours without human intervention. Trained entirely on Huawei Ascend hardware (no Nvidia). Key benchmarks: SWE-bench Pro 58.4 (world #1 at release, surpassing GPT-5.4 57.7 and Claude Opus 4.6 57.3), GPQA Diamond 86.2, AIME 2026 95.3, Terminal-Bench 2.0 63.5, MCP-Atlas 71.8, Chatbot Arena Elo 1475 (June 16, 2026, arena.ai). Available via Z.ai API ($1.40/$4.40 per 1M input/output tokens) and open weights on Hugging Face under MIT license.
2026-04-07
Researched 5d ago
200k contextReasoningTool useFunctionsJSONCode exec
Claude 3 SonnetClaude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
2024-03-04
Researched 71d ago
200k contextReasoningVisionMultimodalJSONCode exec
DeepSeek R1DeepSeek R1: Reasoning-optimized model with extended thinking capabilities. 128K context.
2025-01-20
Researched 71d ago
128k contextReasoningJSONCode exec
Qwen2.5-Coder-32B-InstructInstruction-optimized 32B code flagship for production systems requiring top-tier code reasoning, generation, and multi-file analysis.
2024-11-12
Researched 41d ago
128k contextJSONCode exec
Claude 3.7 SonnetClaude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.
2024-03-04
Researched 71d ago
200k contextReasoningVisionMultimodalTool useFunctions
Qwen3.7-MaxAlibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.
2026-05-19
Researched 2d ago
1m contextReasoningTool useFunctionsJSONCode exec
o3OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.
2025-04-16
Researched 21d ago
200k contextReasoningVisionMultimodalTool useFunctions
Qwen2.5-Coder-32B32B flagship code specialist matching GPT-4o performance with SOTA multi-language repair (75.2% on MdEval) and 3.7% improvement on repo-wide context benchmarks.
2024-11-12
Researched 41d ago
128k contextJSONCode exec
o1-mini (09-12)o1-mini (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window.
2024-09-12
Researched 41d ago
128k contextReasoningCode exec
2024-12-17
Researched 41d ago
128k contextVisionAudioCode exec
No tracked provider route
GPT-4o (11-20)GPT-4o (11-20) is OpenAI's GPT-4o model. It offers a 128K-token context window.
2024-11-20
Researched 41d ago
128k contextVisionCode exec
No tracked provider route
2024-10-01
Researched 179d ago
128k contextVisionAudioCode exec
No tracked provider route
o1-preview (09-12)o1-preview (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window and scores 73.3 on GPQA.
2024-09-12
Researched 41d ago
128k contextReasoningCode exec
No tracked provider route
ChatGPT-4oThe chatgpt-4o-latest model version continuously points to the version of GPT-4o used in ChatGPT, and is updated frequently, when there are significant changes.
2024-05-13
Researched 179d ago
128k contextVisionCode exec
No tracked provider route
Cerebras GPT 590MThe Cerebras GPT 590M is a robust language model featuring 590 million parameters and a transformer architecture akin to GPT-3. It is optimized for natural language processing tasks such as text generation, completion, and summarization. Trained using the Chinchilla scaling laws and Cerebras' weight streaming technology, this model achieves high efficiency, offering faster training times and reduced costs. The Andromeda AI supercomputer facilitated its training on the extensive Pile dataset. Open-sourced under the Apache 2.0 license, it primarily supports English and requires additional tuning for other languages and conversational applications due to its lack of reinforcement learning from human feedback.
2023-03-13
Researched 38d ago
ReasoningCode exec
No tracked provider route
Megatron GPT 5BThe NeMo Megatron-GPT 5B is a transformer-based language model with 5 billion trainable parameters, inspired by models like GPT-2 and GPT-3 1. Its architecture is a decoder-only transformer, designed to sequentially process input for text generation and language understanding tasks 15. Trained on "The Piles" dataset by Eleuther.AI, it leverages its substantial dataset to produce coherent and natural-sounding text while also answering questions and completing sentences 5. Despite its strengths, the model can reflect biases and toxic language from its dataset, sometimes yielding inappropriate outputs. Evaluations on benchmarks like the LM Evaluation Test Suite showcase its varying performance, scoring 0.5566 on ARC-Easy and 0.6133 on Winogrande 1, indicating both strengths and limitations across different tasks.
2019-08-28
Researched 179d ago
ReasoningCode exec
No tracked provider route
GPT-5OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.
2025-08-07
Researched 50d ago
400k contextReasoningVisionMultimodalTool useFunctions
GPT-5 MiniNear-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).
2025-08-07
Researched 50d ago
400k contextReasoningVisionMultimodalTool useFunctions
GPT-5 ProGPT-5 Pro is OpenAI's most advanced GPT-5 tier, offering major improvements in reasoning, code quality, and user experience for enterprise and power-user applications at 400K context.
2025-10-01
Researched 63d ago
400k contextVisionMultimodalTool useFunctionsJSON
Gemini 3 FlashGemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.
2025-12-17
Researched 43d ago
1m contextVisionMultimodalAudioTool useFunctions
GPT-5 NanoFastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.
2025-08-07
Researched 50d ago
400k contextReasoningVisionMultimodalTool useFunctions
GPT-5.4 ProPremium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.
2026-03-01
Researched 50d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
Gemini 3 ProGoogle DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.
2025-12-11
Researched 179d ago
1m contextVisionMultimodalTool useFunctionsCode exec
GPT-5.1 CodexGPT-5.1-Codex is a coding-specialized version of GPT-5.1, optimized for software engineering and agentic coding workflows at 400K context.
2025-12-01
Researched 63d ago
400k contextVisionMultimodalTool useFunctionsJSON
GPT-5 CodexGPT-5 Codex is OpenAI's coding-specialized variant of GPT-5, optimized for software engineering workflows, code generation, and agentic coding tasks at 400K context.
2025-10-01
Researched 63d ago
400k contextVisionMultimodalTool useFunctionsJSON
Gemini 3 Flash PreviewFrontier-class performance rivaling larger models at a fraction of the cost. Most intelligent Gemini model built for speed, combining frontier intelligence with superior search and grounding. $0.50 input / $3.00 output per 1M tokens.
2025-12-17
Researched 71d ago
1m contextVisionMultimodalTool useFunctionsJSON
o3-proAdvanced o3 reasoning model for complex math, science, and coding problems. Supports tools, vision, and extended thinking. Available to Pro users. Released June 10, 2025.
2025-06-10
Researched 41d ago
200k contextReasoningVisionMultimodalTool useFunctions
GPT-4.1OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.
2025-04-01
Researched 50d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
GPT-4.1 MiniFast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.
2025-04-01
Researched 50d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
KAT Coder Pro V2KAT-Coder-Pro V2 is Kwaipilot's flagship agentic coding model, achieving 79.6% on SWE-Bench Verified (March 2026). Designed for complex enterprise software engineering tasks, multi-system coordination, and SaaS integration. Uses a 'Specialize-then-Unify' training paradigm with five specialized expert domains. Context: 256K tokens. Max output: 256K tokens (on Streamlake endpoint). Available via Vercel AI Gateway and OpenRouter.
2026-03-27
Researched 38d ago
256k contextTool useFunctionsJSONCode execPrompt cache
Claude 3.5 HaikuClaude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.
2024-10-22
Researched 28d ago
200k contextReasoningVisionJSONCode execBatch
Morph V3 FastMorph V3 Fast is Morph's fastest code apply model at ~10,500 tokens/sec with 96% accuracy, optimized for rapid code transformations in AI coding workflows.
2026-03-01
Researched 41d ago
Code exec
Relace Apply 3Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits directly into source files at 256K context, designed for precise apply operations in AI coding agents.
2026-01-01
Researched 41d ago
256k contextCode exec
No tracked provider route
DeepSeek V3.1Enhanced reasoning and grounded retrieval model from DeepSeek with multimodal text and image understanding.
2025-08-21
Researched 38d ago
VisionMultimodalJSONCode execPrompt cache
Claude Opus 4.5Claude Opus 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 80.7 on MMMU.
2025-11-01
Researched 41d ago
200k contextReasoningVisionMultimodalTool useFunctions
Claude 3.5 SonnetClaude 3.5 Sonnet, the latest in Anthropic's line of large language models, merges state-of-the-art reasoning, coding, and natural language understanding capabilities with advanced multi-modal processing. Released in October 2024, it excels in benchmarks against previous models and competitors, thanks to its scalable attention mechanisms and massive neural network architecture. Its dynamic routing enables specialization in various tasks, supporting applications from software development and data analysis to customer support and content creation. Users benefit from its "Artifacts" feature for real-time collaborative workflows and can access the model through platforms like Claude.ai and APIs at competitive pricing rates.
2024-06-20
Researched 71d ago
200k contextReasoningVisionMultimodalFunctionsJSON
GPT-4oOpenAI GPT-4o: Flagship multimodal model with vision, function calling, and broad capability. $2.50/M input, $10/M output.
2024-05-13
Researched 50d ago
128k contextVisionMultimodalTool useFunctionsJSON
Qwen3.7-PlusAlibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.
2026-06-03
Researched 20d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Mythos PreviewAnthropic's cybersecurity-focused frontier model, offered as an invitation-only research preview under Project Glasswing. Succeeded by Claude Mythos 5 (API ID: claude-mythos-5) as of June 9, 2026. Anthropic has indicated that Claude Mythos Preview will be retired after Claude Mythos 5 becomes available; no formal retirement date was published as of 2026-06-09. For current access and the migration path, see the Anthropic migration guide.
2026-05-01
Researched 20d ago
1m contextReasoningVisionMultimodalTool useFunctions
Morph V3 LargeMorph V3 Large is Morph's high-accuracy code apply model, achieving ~98% accuracy for precise code transformations at ~4,500 tokens/sec and 256K context.
2026-03-01
Researched 41d ago
256k contextCode exec
Relace SearchRelace Search uses parallel file view and grep tools to explore a codebase and return relevant file sections with 256K context, specialized for AI coding agent pipelines.
2026-01-01
Researched 41d ago
256k contextTool useCode exec
No tracked provider route
Arcee Coder LargeCoder Large is Arcee AI's 32B code-focused model, trained on permissively-licensed GitHub repositories and fine-tuned from Qwen 2.5-Instruct for software engineering tasks.
2025-12-01
Researched 63d ago
Tool useFunctionsJSONCode exec
No tracked provider route
Cogito v2.1 671BCogito v2.1 671B MoE is Deep Cogito's strongest open model, matching performance of frontier closed models. It features deep thinking capabilities and strong results on coding, reasoning, and math benchmarks.
2025-11-19
Researched 53d ago
128k contextReasoningTool useFunctionsJSONCode exec
No tracked provider route
Mistral Medium 3Mistral Medium 3 is Mistral AI's enterprise-grade model delivering frontier-level capabilities including vision, function calling, and code generation at competitive cost for business applications.
2025-05-01
Researched 63d ago
128k contextVisionMultimodalTool useFunctionsJSON
No tracked provider route
Claude Sonnet 4.6Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
2026-02-17
Researched 17d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.7Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.
2026-04-16
Researched 2d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.6Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.
2026-02-05
Researched 41d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Opus 4.8Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.
2026-05-28
Researched 5d ago
1m contextReasoningVisionMultimodalTool useFunctions
GLM 4.7GLM-4.7 is Z.ai's flagship text model featuring enhanced programming capabilities and deeper reasoning at 200K context, succeeding GLM-4.6.
2026-03-01
Researched 38d ago
200k contextTool useFunctionsJSONCode execPrompt cache
Mistral Small 3.2 24BMistral Small 3.2 24B is an updated instruction-tuned model from Mistral optimized for function calling, structured outputs, and vision tasks at 128K context with open weights.
2025-06-01
Researched 63d ago
128k contextVisionMultimodalTool useFunctionsJSON
DeepSeek V3.2DeepSeek V3.2 is DeepSeek's DeepSeek V3 model. It offers a 160K-token context window with weights openly available for self-hosting and scores 70 on SWE-bench Verified.
2025-12-01
Researched 40d ago
160k contextJSONCode execPrompt cache
DeepSeek R1 0528DeepSeek R1 0528 is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 130K-token context window with weights openly available for self-hosting and scores 81 on GPQA.
2025-05-28
Researched 39d ago
130k contextReasoningJSONCode execPrompt cache
Qwen3-Coder-480B-A35B-InstructQwen3-Coder-480B-A35B-Instruct is Alibaba's flagship open-source code generation and agentic model, released July 22, 2025 under the Apache 2.0 license. The model has 480 billion total parameters with 35 billion active parameters per token, organized across 62 transformer layers with 160 specialized expert networks and 8 experts activated per token. It uses Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads and supports a native context window of 262,144 tokens, extendable to 1 million tokens via YaRN position scaling. The model is purpose-built for software engineering tasks and agentic workflows: code generation, code review, test writing, multi-step debugging, and browser-based agentic task execution. On release, it achieved state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks, with performance comparable to Claude Sonnet 4 on these tasks. Available via Fireworks AI, Google Vertex AI, NVIDIA NIM, AWS Bedrock, Novita AI, and the Vercel AI Gateway.
2025-07-22
Researched 10d ago
262k contextTool useFunctionsJSONCode execPrompt cache
Gemini 3.1 Pro PreviewGoogle: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.
2026-02-19
Researched 10d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 2.5 FlashGoogle: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.
2025-06-17
Researched 71d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 3.5 FlashGemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.
2026-05-19
Researched 17d ago
1.05m contextReasoningVisionMultimodalAudioTool use
Gemini 2.5 Flash LiteGoogle: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.
2025-07-22
Researched 71d ago
1m contextVisionMultimodalTool useFunctionsJSON
Gemini 2.5 ProGoogle DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.
2025-06-17
Researched 24d ago
1m contextReasoningVisionMultimodalTool useFunctions
Gemini 3.1 Flash-LiteGemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.
2026-05-07
Researched 10d ago
1.05m contextVisionMultimodalTool useFunctionsJSON
2025-09-01
Researched 71d ago
1m contextVisionMultimodalTool useFunctionsJSON
DeepSeek V3.2 ExpDeepSeek: DeepSeek V3.2 Exp available via OpenRouter. Pricing: $0.27/1M input, $0.41/1M output.
2025-04-10
Researched 38d ago
164k contextJSONCode exec
Antigravity AgentAntigravity Agent is Google DeepMind's preview managed agent for autonomous coding and browsing workflows. Powered by Gemini 3.5 Flash, it plans, reasons, runs code, manages files, and browses the web inside a secure Google-hosted Linux sandbox through the Interactions API. It accepts text and image input, has a 1,048,576-token input context window that compacts at about 135K tokens, and supports a 65,536-token output limit. Environment compute is not billed during preview; Google describes pricing as pay-as-you-go based on underlying Gemini model tokens and tool use.
2026-05-19
Researched 35d ago
1.05m contextReasoningVisionMultimodalTool useCode exec
Gemini 3.1 Flash-LiteGA release of Google's most cost-efficient Gemini 3.1 model, optimized for speed, scale, and cost efficiency. Supersedes gemini-3.1-flash-lite-preview. API model ID: gemini-3.1-flash-lite. Pricing: $0.25/$1.50 per 1M tokens in/out.
2026-05-07
Researched 20d ago
1m contextVisionMultimodalTool useFunctionsJSON
No tracked provider route
GPT-5.4 NanoGPT-5.4 Nano is the smallest and fastest variant in the GPT-5.4 family, optimized for edge deployment and low-latency tasks. Model ID: gpt-5.4-nano.
2026-03-05
Researched 25d ago
400k contextVisionMultimodalTool useFunctionsJSON
GPT-5.5 ProGPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.
2026-04-23
Researched 3d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
GPT-5.4 MiniGPT-5.4 Mini is a smaller, cost-efficient variant of GPT-5.4 with a 400K token context window. Designed for tasks requiring long-context processing at lower cost. Model ID: gpt-5.4-mini.
2026-03-05
Researched 16d ago
400k contextReasoningVisionMultimodalTool useFunctions
GPT-5.5GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.
2026-04-23
Researched 16d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
GPT-5.4GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.
2026-03-05
Researched 16d ago
1.05m contextReasoningVisionMultimodalTool useFunctions
MiniMax M3MiniMax M3 is MiniMax's current API flagship, released June 1, 2026 with MiniMax Sparse Attention (MSA) architecture for economical 1M-token context. It accepts text, image, and video input, supports reasoning, tool use, function calling, native prompt caching, and up to 131,072 output tokens in the tracked API configuration. MiniMax lists the standard <=512K tier at a permanent $0.30/M input and $1.20/M output; >512K long-context service remains limited availability at higher rates. Open-weight model weights are available on Hugging Face as MiniMaxAI/MiniMax-M3 under the MiniMax Community License.
2026-06-01
Researched 4d ago
1m contextReasoningVisionMultimodalTool useFunctions
GPT-5.6 SolOpenAI's flagship GPT-5.6 model and highest-capability tier in the Sol, Terra, and Luna naming system. GPT-5.6 Sol is built for demanding reasoning, long-horizon coding, agentic workflows, and cybersecurity tasks, introducing max reasoning effort and ultra multi-agent mode. Announced June 26, 2026; available only to select trusted partners in limited preview, with broad availability pending.
2026-06-26
Researched 3d ago
ReasoningVisionMultimodalTool useFunctionsCode exec
GPT-5.3-CodexMost capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).
2026-02-05
Researched 19d ago
400k contextReasoningVisionTool useFunctionsJSON
Claude Mythos 5Anthropic's access-gated frontier model for approved Project Glasswing cybersecurity defenders and biomedical research organizations. Shares the same underlying architecture as Claude Fable 5 but operates with safety classifiers lifted in specific domains: cybersecurity safeguards are removed for all Glasswing participants, while biology safeguards are additionally removed for approved biology-track participants. Succeeds Claude Mythos Preview with significantly reduced pricing ($10/$50 per MTok input/output vs. $25/$125 for Mythos Preview), a 1M-token context window, 128k max output tokens, adaptive thinking always on (raw chain of thought never returned), vision, tool use, structured outputs, and the effort parameter for controlling thinking depth. Extended thinking with manual budget_tokens is not supported. Anthropic disabled Mythos 5 access for all customers on June 12, 2026 after a US export control directive. On June 27, 2026, the US Commerce Department partially lifted the restriction, permitting Mythos 5 deployment to approximately 100+ US organizations listed in government Annex A that operate and defend critical infrastructure. The model remains inaccessible to general commercial API customers as of June 28, 2026.
2026-06-09
Researched 1d ago
1m contextReasoningVisionMultimodalTool useFunctions
Claude Haiku 4.5Claude Haiku 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input. It offers a 200K-token context window and scores 73.3 on SWE-bench Verified.
2025-10-01
Researched 35d ago
200k contextVisionMultimodalTool useFunctionsJSON
Qwen3-Coder-NextQwen3-Coder-Next is an ultra-sparse Mixture-of-Experts coding agent model from Alibaba's Qwen team, released February 3, 2026 under Apache 2.0. It has 80B total parameters with 3B active at inference, delivering substantially higher throughput than comparable dense models. It supports a native 256K context window, function calling, structured outputs, Claude Code, Qwen Code, Cline, Kilo, and other scaffold templates. Benchmarks reported in the DAT-3724 datapack include SWE-Bench Pro 44.3%, SWE-Bench Resolved 70.6%, and TerminalBench 2 36.2%.
2026-02-03
Researched 10d ago
256k contextReasoningTool useFunctionsJSONCode exec
Qwen3-Coder-30B-A3B-InstructQwen3-Coder-30B-A3B-Instruct is Alibaba's efficient open-source code generation model in the Qwen3-Coder family, released December 3, 2025 under the Apache 2.0 license. The model has 30.5 billion total parameters with 3.3 billion active per forward pass, organized across 48 transformer layers with 128 experts and 8 activated per token. It uses Grouped Query Attention (GQA) with 32 query heads and 4 key-value heads. Native context window is 262,144 tokens, extendable to 1 million tokens via YaRN. The model supports multi-turn tool calling, function calling, repository-level code understanding, and structured outputs. It is compatible with vLLM, SGLang, Ollama, LM Studio, llama.cpp, and HuggingFace Transformers. Available via AWS Bedrock, Novita AI, and Vercel AI Gateway.
2025-12-03
Researched 10d ago
262k contextTool useFunctionsJSONCode exec
Devstral Small 2Devstral Small 2 is Mistral AI's 24B open-weights coding agent model, released December 9, 2025 under Apache 2.0. It scores 68.0% on SWE-bench Verified and supports agentic software engineering tasks, multi-step reasoning, and tool use. Runs on a single RTX 4090 GPU or a Mac with 32GB RAM. Multimodal, with support for image inputs.
2025-12-09
Researched 58d ago
256k contextVisionMultimodalTool useFunctionsJSON
Mistral Devstral 2 123BMistral Devstral 2 123B is MistralAI's Devstral model focused on code generation and software engineering. It was released 2025-12-01.
2025-12-01
Researched 38d ago
JSONCode exec
Composer 2.5Cursor's agentic coding model released May 18, 2026. Built on Moonshot AI's Kimi K2.5 open-source checkpoint with targeted RL using textual feedback and 25× more synthetic training tasks than Composer 2. Designed for long-horizon software engineering tasks: multi-file edits, terminal command execution, codebase-wide semantic search, and autonomous task planning. Uses Cursor's compaction-in-the-loop context management for long coding sessions. Available on all Cursor plans; accessed through the Cursor IDE (not a standalone API). Standard pricing: $0.50/M input, $2.50/M output; Fast (default): $3.00/M input, $15.00/M output.
2026-05-18
Researched 39d ago
1m contextTool useFunctionsCode exec