Concepts & capability filters

Capability filtercapabilityintermediate

Reasoning

Also known as: reasoning model, deliberative reasoning

deliberate problem solving

127

matching active models

27

tracked providers

87

models with routes

model.reasoning

Definition

Reasoning capability describes models marketed or tracked as stronger at multi-step problem solving, planning, math, coding, and answer checking. It is not a guarantee for every workload; use it with benchmark and provider-route evidence when choosing a production model.

Models With Reasoning

Showing the first 80 decision-sorted matches, with model flags and provider-route evidence from seed data.

127 matches

ModelReleaseContextCapabilitiesProvider route

Phi-4 Mini Flash Reasoning

Lightweight reasoning variant of Microsoft Phi-4 Mini optimized for fast inference.

2025-12-01

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

Pricing not tracked / 1M tokens

1 route

Magistral Small 2506

Mistral Magistral Small reasoning model released January 2026.

2025-06-10

Researched 1d ago

128K

128,000 tokens

128K contextReasoning

Pricing not tracked / 1M tokens

1 route

Tencent Hy3 Preview

Tencent HunYuan's Hy3 Preview is a high-efficiency Mixture-of-Experts language model for agentic and production workflows. OpenRouter lists tencent/hy3-preview as released Apr 22, 2026 with 262,144 context, free preview pricing, reasoning controls, and tool-use support. Hugging Face Transformers documents Hy3-preview as a Tencent HunYuan MoE model with a dense-MoE hybrid architecture, 192 routed experts, and one always-active shared expert per MoE layer. SCMP's Apr 23 coverage reports Tencent described HY3-Preview as a new flagship model developed by the HunYuan and Yuanbao teams with 295B parameters. Treat release metadata as high confidence for existence/context and medium confidence for exact parameter count until Tencent publishes a primary technical card.

2026-04-22

Researched 22d ago

262K

262,144 tokens

262K contextReasoningTool useFunctions

Free in / Free out / 1M tokens

1 route

Xiaomi MiMo-V2.5

Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.

2026-04-22

Researched 22d ago

1M

1,048,576 tokens

1M contextReasoningVisionMultimodalTool useFunctions

$0.400 in / $2.00 out / 1M tokens

1 route

Kimi K2.6 is Moonshot AI's latest agentic reasoning model, launched April 13 2026 as a code preview for Kimi Code subscribers. Built on a 1-trillion-parameter MoE architecture (32B active, 384 experts), it inherits K2.5's 256K context window and adds enhanced reliability for long-horizon agentic workflows — supporting 200–300 sequential tool calls without drift. Optimized for coding, multi-step agent planning, and vision-assisted tasks such as processing screenshots, PDFs, and spreadsheets.

2026-04-20

Researched 9d ago

262K

262,144 tokens

262K contextReasoningVisionMultimodalTool useFunctions

$0.750 in / $3.50 out / 1M tokens

4 routes · 1 cache

Hermes 4's 405B hosted variant on Nous Portal. The portal describes it as the largest Hermes 4 model, focused on advanced reasoning and creative depth rather than inference speed or cost.

2025-09-22

Researched 22d ago

128K

128,000 tokens

128K contextReasoning

$0.090 in / $0.370 out / 1M tokens

1 route

Nanbeige4-3B is an open-source 3B-parameter language model by Nanbeige LLM Lab (BOSS Zhipin), released December 2025. Pre-trained on 23 trillion high-quality tokens with SFT on 30M+ diverse instructions. Context extended to 64K via Adjusted Base Frequency (ABF). Sets state-of-the-art on AIME 2024 (90.4), AIME 2025 (85.6), and GPQA-Diamond (82.2) for sub-10B models, outperforming models up to 10× larger including Qwen3-32B. arxiv: 2512.06266. HuggingFace: Nanbeige/Nanbeige4-3B-Base.

2025-12-13

Researched 13d ago

64K

64,000 tokens

Reasoning

No tracked provider route

Grok-0, developed by xAI, is a large language model (LLM) that boasts 33 billion parameters 237. It impressively performs on par with Meta's 70-billion parameter LLaMA 2 model, despite utilizing only half the training resources, highlighting its architectural efficiency and optimization 235. Grok-0 served as the prototype for this efficient design before being succeeded by Grok-1, which further enhanced reasoning and coding capabilities 2.

2023-08-18

Researched 134d ago

—

No window data

Reasoning

No tracked provider route

Grok-1, created by xAI, is a formidable 314-billion parameter Mixture-of-Experts (MoE) language model. It boasts a sophisticated architecture with 8 experts, leveraging 2 for each token input, spread across 64 layers and equipped with 48 attention heads per query. This vast model was trained from scratch using a specially crafted training stack based on JAX and Rust, finishing its pre-training phase by October 2023. Released as a base model under the permissive Apache 2.0 license, its open-source framework allows both commercial and non-commercial applications, though it lacks fine-tuning for specific tasks. Benchmarks highlight Grok-1's superior reasoning on various tasks but recognize its potential for generating inaccuracies ("hallucinations"). Running on a local setup requires substantial hardware, including a multi-GPU system, for efficient performance.

2023-11-03

Researched 134d ago

—

No window data

Reasoning

No tracked provider route

Grok-1.5V, created by xAI, is a multimodal large language model that combines both text and image processing capabilities. This model excels at interpreting and interacting with diverse visual data, including documents, diagrams, charts, screenshots, and photographs. Its multimodal nature allows it to perform advanced tasks like translating diagrams into code, generating image descriptions, and answering questions based on visual inputs, all while displaying a strong understanding of spatial information. Grok-1.5V has demonstrated competitive prowess against top models such as GPT-4V and Gemini Pro 1.5, particularly in areas that require spatial reasoning. Initially, access is primarily limited to early testers and existing Grok users, with plans for broader availability in the future 124.

2024-04-12

Researched 134d ago

—

No window data

Reasoning

No tracked provider route

Grok-1.5, developed by xAI, Elon Musk's AI company, is a large language model focused on advanced reasoning skills in coding and mathematics, highlighted by its exceptional performance on benchmarks such as MATH, GSM8K, and HumanEval 123. It supports handling long contexts of up to 128,000 tokens, surpassing its predecessor in this area, and is built using a custom distributed training framework on JAX, Rust, and Kubernetes 12. Designed for comprehensive context understanding and logical reasoning, it is being deployed to early testers and users. Additionally, a multimodal version, Grok-1.5V, is available, which incorporates visual information processing capabilities, including documents, diagrams, and photographs 71113.

2024-03-29

Researched 134d ago

—

No window data

Reasoning

No tracked provider route

Post-training variant of GLM-5 from Zhipu AI with enhanced reasoning and coding capabilities. 754B parameters (40B active) in Mixture of Experts architecture. Optimized for complex agentic workflows and multi-step reasoning. Available via Z.AI API and open weights under the MIT license.

2026-04-07

Researched 11d ago

200k

200,000 tokens

200k contextReasoningTool useFunctionsJSONCode exec

$1.05 in / $3.50 out / 1M tokens

3 routes

Lightweight variant of Grok-2 from xAI with extended context for general reasoning tasks.

2024-08-01

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

No tracked provider route

Enhanced contextual memory with limited image input; political filter added.

2024-08-01

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSON

$0.500 in / $0.500 out / 1M tokens

1 route

Claude 3 Sonnet

Claude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

2024-03-04

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalJSONCode exec

$3.00 in / $15.00 out / 1M tokens

2 routes · 1 cache

DeepSeek R1: Reasoning-optimized model with extended thinking capabilities. 128K context.

2025-01-20

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSONCode exec

$0.100 in / $0.300 out / 1M tokens

13 routes

Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.

2024-03-04

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch

Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.

2026-02-11

Researched 26d ago

200k

200,000 tokens

200k contextReasoningTool useFunctionsJSON

$0.600 in / $2.08 out / 1M tokens

5 routes

DeepSeek R1 Distill Llama 70B

Large-scale distilled DeepSeek R1 leveraging Llama 70B for complex reasoning.

2025-01-20

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSON

$0.700 in / $0.800 out / 1M tokens

4 routes

DeepSeek V4 Pro

DeepSeek V4 Pro is the flagship 1.6T parameter (49B activated) Mixture-of-Experts language model with 1M-token context. Features hybrid attention (CSA+HCA) requiring only 27% of inference FLOPs vs DeepSeek-V3.2 at 1M context, Manifold-Constrained Hyper-Connections (mHC), and Muon Optimizer for training stability. Achieves 93.5% on LiveCodeBench, 89.8% on IMOAnswerBench, and 90.1% on MMLU. Supports Non-Think, Think High, and Think Max reasoning modes. Pricing: $1.74/1M input, $3.48/1M output (cache hit: $0.145/1M input). MIT licensed. Pricing note: DeepSeek API docs state that deepseek-v4-pro is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.

2026-04-24

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSONPrompt cache

DeepSeek Platform

$0.435 in / $0.870 out / 1M tokens

3 routes · 1 cache

DeepSeek R1 Distill Qwen-32B

Distilled DeepSeek R1 reasoning in Qwen-32B for advanced problem-solving.

2025-01-20

Researched 26d ago

128K

128,000 tokens

128K contextReasoningJSON

$0.290 in / $0.290 out / 1M tokens

3 routes

Kimi K2 Instruct

Kimi K2 Instruct is an instruction-tuned language model from Moonshot AI, available via Fireworks AI.

2025-01-01

Researched 26d ago

—

No window data

ReasoningJSON

$0.600 in / $2.50 out / 1M tokens

3 routes

GPT-5.2 is OpenAI's incremental update in the GPT-5 series offering improvements in agentic coding and long-context performance at 128K context.

2025-12-11

Researched 1d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions

$1.75 in / $14.00 out / 1M tokens

2 routes

OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.

2025-03-31

Researched 5d ago

200K

200,000 tokens

200K contextReasoningJSONCode execPrompt cacheBatch

$2.00 in / $8.00 out / 1M tokens

2 routes · 1 batch · 1 cache

DeepSeek R1 Distill Llama 8B

Distilled DeepSeek R1 reasoning encoded into Llama 8B architecture.

2025-01-20

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

$0.200 in / $0.200 out / 1M tokens

2 routes

DeepSeek R1 Distill Qwen-14B

Distilled DeepSeek R1 with reasoning in Qwen-14B for mid-scale inference.

2025-01-20

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

$0.200 in / $0.200 out / 1M tokens

2 routes

DeepSeek R1 Distill Qwen-7B

Distilled DeepSeek R1 reasoning capabilities in Qwen-7B form factor.

2025-01-20

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

$0.200 in / $0.200 out / 1M tokens

2 routes

Hermes 4's 70B hosted variant on Nous Portal. The portal describes it as a hybrid-mode reasoning model that balances scale and size while staying fast and cost effective for complex reasoning tasks.

2025-09-22

Researched 22d ago

128K

128,000 tokens

128K contextReasoning

$0.050 in / $0.200 out / 1M tokens

1 route

DeepSeek R1 Distill Qwen-1.5B

Distilled DeepSeek R1 based on Qwen-1.5B for compact reasoning.

2025-01-20

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

$0.100 in / $0.100 out / 1M tokens

1 route

o1-mini (09-12)

OpenAI o1-mini model emphasizing fast reasoning for smaller tasks and problems.

2024-09-12

Researched 134d ago

128K

128,000 tokens

128K contextReasoningCode exec

$1.10 in / $4.40 out / 1M tokens

1 route

Nanbeige4.1-3B is an open-source 3.93B-parameter reasoning and agentic language model by Nanbeige LLM Lab (BOSS Zhipin), released February 11, 2026. Built on Nanbeige4-3B-Base with further SFT and reinforcement learning. Supports 256K token context. A unified generalist model achieving strong reasoning, preference alignment, and agentic tool-use capabilities at the ~3B scale. Competitive with much larger open models. arxiv: 2602.13367. HuggingFace: Nanbeige/Nanbeige4.1-3B.

2026-02-11

Researched 13d ago

256K

256,000 tokens

256K contextReasoningTool useFunctions

No tracked provider route

MiniMax-M1 is a large-scale open-weight reasoning model from MiniMax with 456B total parameters and a 1M token context window, designed for extended reasoning and high-efficiency inference.

2025-09-01

Researched 18d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSON

No tracked provider route

Tencent Hunyuan T1

Tencent Hunyuan T1 is a deep-thinking reasoning model released March 21, 2025. Built on Hunyuan TurboS base, using a Hybrid-Transformer-Mamba MoE architecture — the first ultra-large-scale Mamba-powered LLM with 16 total experts and 52B activated parameters via dynamic routing. 96.7% of compute allocated to reinforcement learning post-training. Benchmarks: MATH-500 96.2, LiveCodeBench 64.9, GPQA Diamond 69.3, MMLU-PRO 87.2 (second only to o1). Decoding speed 2× faster than comparable transformer models. 256K token context. Available on Tencent Cloud API. Source: https://tencent.github.io/llm.hunyuan.T1/README_EN.html

2025-03-21

Researched 13d ago

256K

256,000 tokens

256K contextReasoning

No tracked provider route

DeepSeek R1 Zero

2025-01-20

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

No tracked provider route

DeepSeek R1 Lite

Lightweight DeepSeek R1 reasoning model optimized for speed.

2024-11-21

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

No tracked provider route

o1-preview (09-12)

OpenAI o1 preview model emphasizing reasoning and complex problem-solving.

2024-09-12

Researched 134d ago

128K

128,000 tokens

128K contextReasoningCode exec

No tracked provider route

Cerebras GPT 590M

The Cerebras GPT 590M is a robust language model featuring 590 million parameters and a transformer architecture akin to GPT-3. It is optimized for natural language processing tasks such as text generation, completion, and summarization. Trained using the Chinchilla scaling laws and Cerebras' weight streaming technology, this model achieves high efficiency, offering faster training times and reduced costs. The Andromeda AI supercomputer facilitated its training on the extensive Pile dataset. Open-sourced under the Apache 2.0 license, it primarily supports English and requires additional tuning for other languages and conversational applications due to its lack of reinforcement learning from human feedback.

2023-03-13

Researched 134d ago

—

No window data

ReasoningCode exec

No tracked provider route

Megatron GPT 5B

The NeMo Megatron-GPT 5B is a transformer-based language model with 5 billion trainable parameters, inspired by models like GPT-2 and GPT-3 1. Its architecture is a decoder-only transformer, designed to sequentially process input for text generation and language understanding tasks 15. Trained on "The Piles" dataset by Eleuther.AI, it leverages its substantial dataset to produce coherent and natural-sounding text while also answering questions and completing sentences 5. Despite its strengths, the model can reflect biases and toxic language from its dataset, sometimes yielding inappropriate outputs. Evaluations on benchmarks like the LM Evaluation Test Suite showcase its varying performance, scoring 0.5566 on ARC-Easy and 0.6133 on Winogrande 1, indicating both strengths and limitations across different tasks.

2019-08-28

Researched 134d ago

—

No window data

ReasoningCode exec

No tracked provider route

OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions

$1.25 in / $10.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions

$0.250 in / $2.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions

$0.050 in / $0.400 out / 1M tokens

3 routes · 1 batch · 1 cache

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 5d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions

$30.00 in / $180.00 out / 1M tokens

2 routes · 1 batch

Advanced o3 reasoning model for complex math, science, and coding problems. Supports tools, vision, and extended thinking. Available to Pro users. Released June 10, 2025.

2025-06-10

Researched 26d ago

—

No window data

ReasoningVisionMultimodalTool useFunctionsJSON

$20.00 in / $80.00 out / 1M tokens

2 routes

Seed 1.6 Flash is ByteDance Seed's ultra-fast multimodal thinking model supporting text and visual understanding at 256K context, optimized for low-latency inference.

2026-03-01

Researched 18d ago

256K

256,000 tokens

256K contextReasoningVisionMultimodalTool useFunctions

No tracked provider route

Amazon Nova 2 Lite

Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.

2026-03-01

Researched 18d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

No tracked provider route

Seed 1.6 is a general-purpose multimodal model from ByteDance Seed supporting text, image, and video inputs. It incorporates multimodal capabilities and deep thinking for complex tasks at 256K context.

2026-03-01

Researched 18d ago

256K

256,000 tokens

256K contextReasoningVisionMultimodalTool useFunctions

No tracked provider route

o3 Deep Research

o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex multi-step research tasks by synthesizing information from multiple sources at 200K context.

2025-10-10

Researched 1d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions

No tracked provider route

Open-weight dense Qwen3.6 27B model with native multimodal support across text, image, and video. Apache 2.0.

2026-04-27

Researched 1d ago

262K

262,144 tokens

262K contextReasoningVisionMultimodalTool useFunctions

Alibaba Cloud PAI-EAS

$0.320 in / $3.20 out / 1M tokens

2 routes

Aion-1.0-Mini is a 32B parameter model distilled from DeepSeek-R1, designed for strong performance in reasoning and coding at a smaller footprint.

2026-01-01

Researched 18d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

No tracked provider route

Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.

2024-10-22

Researched 26d ago

200k

200,000 tokens

200k contextReasoningVisionJSONCode execBatch

$0.800 in / $4.00 out / 1M tokens

5 routes · 1 batch · 1 cache

OLMo 3 32B Think

OLMo 3 32B Think is Allen Institute for AI's reasoning-focused model with extended thinking chains for complex logic problems and multi-step reasoning.

2026-03-01

Researched 18d ago

64K

64,000 tokens

Reasoning

No tracked provider route

Claude Sonnet 4.5

Claude Sonnet 4.5 available on AWS Bedrock

2025-09-29

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

7 routes · 1 batch

Claude 3.5 Sonnet

Claude 3.5 Sonnet, the latest in Anthropic's line of large language models, merges state-of-the-art reasoning, coding, and natural language understanding capabilities with advanced multi-modal processing. Released in October 2024, it excels in benchmarks against previous models and competitors, thanks to its scalable attention mechanisms and massive neural network architecture. Its dynamic routing enables specialization in various tasks, supporting applications from software development and data analysis to customer support and content creation. Users benefit from its "Artifacts" feature for real-time collaborative workflows and can access the model through platforms like Claude.ai and APIs at competitive pricing rates.

2024-06-20

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalFunctionsJSON

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 cache

Claude Opus 4.5

Claude Opus 4.5 available on AWS Bedrock

2025-11-01

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

5 routes · 1 batch

Claude Mythos Preview

Claude Mythos Preview is Anthropic's frontier research model, positioned above the public Claude 4 family and released exclusively via invitation-only Project Glasswing to roughly 12 launch partners and over 40 organizations working on critical infrastructure. No public API or self-serve access. Specializes in defensive cybersecurity — autonomously identified zero-day vulnerabilities including a 27-year-old OpenBSD TCP SACK remote code execution bug and a 17-year-old FreeBSD NFS RCE. Codenamed Capybara internally. Scores 93.9% on SWE-bench Verified, 82.0% on Terminal-Bench 2.0, and 97.6% on USAMO 2026. Partner pricing: $25/$125 per million tokens (input/output). Max output: 128K tokens. Knowledge cutoff: December 2025.

2026-04-07

Researched 14d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

$25.00 in / $125.00 out / 1M tokens

1 route

INTELLECT-3 is Prime Intellect's 106B-parameter MoE model with 12B active parameters, post-trained from GLM-4.5-Air-Base via SFT and reinforcement learning, matching frontier closed-model performance.

2026-04-01

Researched 18d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

No tracked provider route

Aion-1.0 is a multi-model system from AionLabs designed for high performance across reasoning and coding tasks, trained with direct distillation from frontier models.

2026-01-01

Researched 18d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

No tracked provider route

Arcee Maestro Reasoning

Maestro Reasoning is Arcee AI's flagship 32B analysis and reasoning model, fine-tuned with DPO from Qwen 2.5-32B for cross-domain reasoning, mathematics, and structured analysis tasks.

2025-12-01

Researched 18d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

No tracked provider route

Cogito v2.1 671B

Cogito v2.1 671B MoE is Deep Cogito's strongest open model, matching performance of frontier closed models. It features deep thinking capabilities and strong results on coding, reasoning, and math benchmarks.

2025-11-19

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSONCode exec

No tracked provider route

Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 7d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

4 routes · 1 batch · 1 cache

Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

5 routes · 1 batch · 1 cache

Claude Opus 4.6

Claude Opus 4.6 available on AWS Bedrock

2026-02-05

Researched 26d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

4 routes · 1 batch · 1 cache

Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse MoE architecture, available for preview as part of the Qwen3.6 series.

2026-04-20

Researched 3d ago

256K

256,000 tokens

256K contextReasoningVisionMultimodalTool useFunctions

Alibaba Cloud PAI-EAS

$1.04 in / $6.24 out / 1M tokens

2 routes

Mistral Medium 3.5

Mistral Medium 3.5 is Mistral AI's first flagship merged model, combining instruction-following, reasoning, coding, and vision in one dense 128B model. It supports configurable reasoning effort, text and image input, native function calling, JSON output, and a 256K context window. Released as open weights under Mistral's Modified MIT license, it can be self-hosted on as few as four H100/H200 GPUs and scores 77.6% on SWE-bench Verified.

2026-04-29

Researched 1d ago

256K

256,000 tokens

256K contextReasoningVisionMultimodalTool useFunctions

Mistral AI Studio

$1.50 in / $7.50 out / 1M tokens

1 route

DeepSeek V4 Flash

DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.

2026-04-24

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningTool useFunctionsJSONPrompt cache

DeepSeek Platform

$0.140 in / $0.280 out / 1M tokens

2 routes · 1 cache

DeepSeek R1 0528

2025-01-01

Researched 26d ago

160K

160,000 tokens

160K contextReasoningJSONCode exec

$0.100 in / $0.300 out / 1M tokens

5 routes

Kimi K2 Thinking

Extended thinking variant of Kimi K2 with native reasoning capabilities. 256K context.

2025-01-01

Researched 26d ago

256K

256,000 tokens

256K contextReasoningJSON

$0.600 in / $2.50 out / 1M tokens

5 routes

MiniMax M2.7 is MiniMax's self-improving frontier model, released March 18, 2026. It introduces native multi-agent collaboration, complex skill orchestration, and early recursive self-improvement capabilities. The model uses 10B active parameters, supports a 204,800-token context window, and was released alongside MiniMax-M2.7-highspeed, a 66% faster latency-optimized variant. Public provider listings price standard M2.7 at $0.30 per 1M input tokens and $1.20 per 1M output tokens.

2026-03-18

Researched 11d ago

205K

204,800 tokens

205K contextReasoningTool useFunctionsJSON

$0.300 in / $1.20 out / 1M tokens

2 routes

MiniMax M2.5 Highspeed

MiniMax M2.5 Highspeed is MiniMax's inference-optimized variant of M2.5, released simultaneously in February 2026. It delivers identical intelligence and outputs to standard M2.5 through a specialized inference engine at lower latency. The model supports a 204,800-token context window, 131,072-token max output, function calling, structured output, and reasoning. API model ID: MiniMax-M2.5-highspeed. It is designed for latency-sensitive interactive applications and automated agent pipelines.

2026-02-12

Researched 11d ago

205K

204,800 tokens

205K contextReasoningTool useFunctionsJSON

$0.600 in / $2.40 out / 1M tokens

2 routes

MiniMax M2.7 Highspeed

MiniMax M2.7 Highspeed is the inference-optimized variant of MiniMax M2.7, released simultaneously on March 18, 2026. It reaches 100 tokens per second output speed, about 66% faster than standard M2.7, while preserving identical intelligence and outputs through engine optimization rather than weight changes. It supports a 204,800-token context window, 131,072-token max output, function calling, structured output, and reasoning. API model ID: MiniMax-M2.7-highspeed.

2026-03-18

Researched 11d ago

205K

204,800 tokens

205K contextReasoningTool useFunctionsJSON

Pricing not tracked / 1M tokens

1 route

Cogito v1 Preview Llama 3B

Cogito v1 Preview Llama 3B is Deep Cogito's smallest hybrid reasoning model. Fine-tuned from Llama 3.2 3B using Iterated Distillation and Amplification (IDA). Supports both direct and extended-thinking (reasoning) modes, tool calling, and 30+ languages.

2025-04-08

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

$0.100 in / $0.100 out / 1M tokens

1 route

Cogito v1 Preview Llama 70B

Cogito v1 Preview Llama 70B is Deep Cogito's largest v1 dense model. Fine-tuned from a Llama 70B base using Iterated Distillation and Amplification (IDA). Outperforms Llama 4 109B MoE on standard benchmarks according to Deep Cogito. Supports direct and reasoning modes with tool calling.

2025-04-08

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

$0.900 in / $0.900 out / 1M tokens

1 route

Cogito v1 Preview Llama 8B

Cogito v1 Preview Llama 8B is a hybrid reasoning model fine-tuned from Llama 3.1 8B using Iterated Distillation and Amplification (IDA). Supports direct and extended-thinking modes, tool calling, and 30+ languages.

2025-04-08

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

$0.200 in / $0.200 out / 1M tokens

1 route

Cogito v1 Preview Qwen-14B

Cogito v1 Preview Qwen-14B is a hybrid reasoning model fine-tuned from Qwen 2.5 14B using Iterated Distillation and Amplification (IDA). Supports direct and extended-thinking modes, tool calling, and 30+ languages.

2025-04-08

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

$0.200 in / $0.200 out / 1M tokens

1 route

Cogito v1 Preview Qwen-32B

Cogito v1 Preview Qwen-32B is a hybrid reasoning model fine-tuned from Qwen 2.5 32B using Iterated Distillation and Amplification (IDA). Supports direct and extended-thinking modes, tool calling, and 30+ languages.

2025-04-08

Researched 8d ago

128K

128,000 tokens

128K contextReasoningTool useFunctionsJSON

$0.900 in / $0.900 out / 1M tokens

1 route

DeepSeek Prover V2

2025-01-01

Researched 134d ago

160K

160,000 tokens

160K contextReasoning

$0.560 in / $1.68 out / 1M tokens

1 route

DeepSeek R1 0528 Distill Qwen3-8B

2025-01-01

Researched 134d ago

160K

160,000 tokens

160K contextReasoning

$0.200 in / $0.200 out / 1M tokens

1 route

DeepSeek R1 0528 Qwen3-8B

2025-01-01

Researched 134d ago

160K

160,000 tokens

160K contextReasoning

$0.200 in / $0.200 out / 1M tokens

1 route

DeepSeek R1 Basic

2025-01-01

Researched 134d ago

160K

160,000 tokens

160K contextReasoning

$0.560 in / $1.68 out / 1M tokens

1 route

GLM Z1 Rumination 32B

2025-01-01

Researched 134d ago

128K

128,000 tokens

128K contextReasoning

$0.900 in / $0.900 out / 1M tokens

1 route