Lightweight reasoning variant of Microsoft Phi-4 Mini optimized for fast inference.
2025-12-01
Researched 36d ago
128k
128,000 tokens
Also known as: reasoning model, deliberative reasoning
deliberate problem solving
See matching models with benchmark scores and pricing.
178
matching active models
39
tracked providers
134
models with routes
Reasoning capability describes models marketed or tracked as stronger at multi-step problem solving, planning, math, coding, and answer checking. It is not a guarantee for every workload; use it with benchmark and provider-route evidence when choosing a production model.
Showing the first 80 matches, sorted by decision relevance, with tracked capability and provider-route evidence.
Lightweight reasoning variant of Microsoft Phi-4 Mini optimized for fast inference.
2025-12-01
Researched 36d ago
128k
128,000 tokens
Magistral Small 2506 is MistralAI's Magistral model focused on step-by-step reasoning. It offers a 128K-token context window.
2025-06-10
Researched 36d ago
128k
128,000 tokens
Tencent HunYuan's Hy3 Preview is a high-efficiency Mixture-of-Experts language model for agentic and production workflows. OpenRouter lists tencent/hy3-preview as released Apr 22, 2026 with 262,144 context, preview pricing, reasoning controls, and tool-use support. Hugging Face Transformers documents Hy3-preview as a Tencent HunYuan MoE model with a dense-MoE hybrid architecture, 192 routed experts, and one always-active shared expert per MoE layer. SCMP's Apr 23 coverage reports Tencent described HY3-Preview as a new flagship model developed by the HunYuan and Yuanbao teams with 295B parameters. Treat release metadata as high confidence for existence/context and medium confidence for exact parameter count until Tencent publishes a primary technical card.
2026-04-22
Researched 20d ago
262k
262,144 tokens
Xiaomi MiMo-V2.5 is the lower-cost native omnimodal sibling in the MiMo-V2.5 series. OpenRouter describes it as supporting text, image, audio, and video inputs with text output, Pro-level agentic performance at roughly half the inference cost, and improved multimodal perception over MiMo-V2-Omni. Xiaomi's official April 22 release page highlights MiMo-V2.5 alongside MiMo-V2.5-Pro in benchmark data and says the V2.5 series will be open-sourced soon; no public weights/license were verified at research time.
2026-04-22
Researched 32d ago
1.05m
1,048,576 tokens
Kimi K2.6 is Moonshot AI's multimodal agentic coding model, released April 20 2026 under a Modified MIT license. Built on a 1-trillion-parameter MoE architecture (32B active, 384 experts with 8 selected per token plus 1 shared expert, 61 layers), it features a 262K context window and up to 65,536 output tokens. Supports native image and video inputs (screenshots, PDFs, spreadsheets). Designed for long-horizon coding with agent swarms of up to 300 sub-agents and 4,000 coordinated steps; Moonshot AI cites 200–300 sequential tool calls without task drift. Key benchmarks: SWE-bench Verified 80.2%, SWE-bench Pro 58.6%, LiveCodeBench v6 89.6%, GPQA Diamond 90.5%, Terminal-Bench 2.0 66.7%. Chatbot Arena Elo 1454 (2026-04-28 snapshot).
2026-04-20
Researched 2d ago
262k
262,144 tokens
Hermes 4's 405B hosted variant on Nous Portal. The portal describes it as the largest Hermes 4 model, focused on advanced reasoning and creative depth rather than inference speed or cost.
2025-09-22
Researched 65d ago
128k
128,000 tokens
Nanbeige4-3B is an open-source 3B-parameter language model by Nanbeige LLM Lab (BOSS Zhipin), released December 2025. Pre-trained on 23 trillion high-quality tokens with SFT on 30M+ diverse instructions. Context extended to 64K via Adjusted Base Frequency (ABF). Sets state-of-the-art on AIME 2024 (90.4), AIME 2025 (85.6), and GPQA-Diamond (82.2) for sub-10B models, outperforming models up to 10× larger including Qwen3-32B. arxiv: 2512.06266. HuggingFace: Nanbeige/Nanbeige4-3B-Base.
2025-12-13
Researched 39d ago
64k
64,000 tokens
No tracked provider route
Kimi K2.7-Code is Moonshot AI's coding-focused multimodal model released June 12, 2026, built on Kimi K2.6. Uses the same 1-trillion-parameter MoE architecture (32B active parameters, 384 experts with 8 selected per token, 61 layers) with a 262K context window and MoonViT vision encoder (400M parameters). Reports +21.8% on Moonshot's Kimi Code Bench v2, +11.0% on Program Bench, +31.5% on MLS Bench Lite versus K2.6, with approximately 30% fewer reasoning tokens. Forces thinking mode on by default and preserves reasoning content across multi-turn interactions for agentic use. Available via Kimi platform API and HuggingFace under Modified MIT license.
2026-06-12
Researched 2d ago
262k
262,144 tokens
GLM-5.2 is Z.ai's coding-first successor to GLM-5.1 in the GLM-5 family, released June 13 2026. 753B parameters (40B active) in IndexShare MoE architecture; the IndexShare innovation reuses the same attention indexer across every four sparse layers, cutting per-token FLOPs by 2.9x at 1M context length. Trained on 28.5T tokens. Supports a 1M-token context window via the glm-5.2[1m] model ID, with 131,072-token maximum output and High/Max thinking-effort levels designed for extended agentic coding sessions. MIT license; open weights available on Hugging Face (zai-org/GLM-5.2 and zai-org/GLM-5.2-FP8). Self-reported HF card benchmarks: SWE-bench Pro 62.1, Terminal-Bench 2.1 82.7, MCP-Atlas 76.8, Tool-Decathlon 48.2, GPQA Diamond 91.2, AIME 2026 99.2, HLE 40.5. Available to GLM Coding Plan subscribers (Lite/Pro/Max/Team) directly, and via OpenRouter token API ($1.40/$4.40 per 1M tokens).
2026-06-13
Researched 3d ago
1m
1,000,000 tokens
Grok-0, developed by xAI, is a large language model (LLM) that boasts 33 billion parameters 237. It impressively performs on par with Meta's 70-billion parameter LLaMA 2 model, despite utilizing only half the training resources, highlighting its architectural efficiency and optimization 235. Grok-0 served as the prototype for this efficient design before being succeeded by Grok-1, which further enhanced reasoning and coding capabilities 2.
2023-08-18
Researched 36d ago
—
No window data
No tracked provider route
Grok-1, created by xAI, is a formidable 314-billion parameter Mixture-of-Experts (MoE) language model. It boasts a sophisticated architecture with 8 experts, leveraging 2 for each token input, spread across 64 layers and equipped with 48 attention heads per query. This vast model was trained from scratch using a specially crafted training stack based on JAX and Rust, finishing its pre-training phase by October 2023. Released as a base model under the permissive Apache 2.0 license, its open-source framework allows both commercial and non-commercial applications, though it lacks fine-tuning for specific tasks. Benchmarks highlight Grok-1's superior reasoning on various tasks but recognize its potential for generating inaccuracies ("hallucinations"). Running on a local setup requires substantial hardware, including a multi-GPU system, for efficient performance.
2023-11-03
Researched 38d ago
—
No window data
No tracked provider route
Grok-1.5V, created by xAI, is a multimodal large language model that combines both text and image processing capabilities. This model excels at interpreting and interacting with diverse visual data, including documents, diagrams, charts, screenshots, and photographs. Its multimodal nature allows it to perform advanced tasks like translating diagrams into code, generating image descriptions, and answering questions based on visual inputs, all while displaying a strong understanding of spatial information. Grok-1.5V has demonstrated competitive prowess against top models such as GPT-4V and Gemini Pro 1.5, particularly in areas that require spatial reasoning. Initially, access is primarily limited to early testers and existing Grok users, with plans for broader availability in the future 124.
2024-04-12
Researched 38d ago
—
No window data
No tracked provider route
Grok-1.5, developed by xAI, Elon Musk's AI company, is a large language model focused on advanced reasoning skills in coding and mathematics, highlighted by its exceptional performance on benchmarks such as MATH, GSM8K, and HumanEval 123. It supports handling long contexts of up to 128,000 tokens, surpassing its predecessor in this area, and is built using a custom distributed training framework on JAX, Rust, and Kubernetes 12. Designed for comprehensive context understanding and logical reasoning, it is being deployed to early testers and users. Additionally, a multimodal version, Grok-1.5V, is available, which incorporates visual information processing capabilities, including documents, diagrams, and photographs 71113.
2024-03-29
Researched 38d ago
—
No window data
No tracked provider route
Post-training variant of GLM-5 from Z.ai (Zhipu AI) with enhanced agentic coding capabilities. Released April 7, 2026. 754B parameters (40B active) in Mixture of Experts architecture, 200K token context, 128K max output. Supports autonomous plan–execute–test–fix–optimize loops for up to 8 hours without human intervention. Trained entirely on Huawei Ascend hardware (no Nvidia). Key benchmarks: SWE-bench Pro 58.4 (world #1 at release, surpassing GPT-5.4 57.7 and Claude Opus 4.6 57.3), GPQA Diamond 86.2, AIME 2026 95.3, Terminal-Bench 2.0 63.5, MCP-Atlas 71.8, Chatbot Arena Elo 1475 (June 16, 2026, arena.ai). Available via Z.ai API ($1.40/$4.40 per 1M input/output tokens) and open weights on Hugging Face under MIT license.
2026-04-07
Researched 3d ago
200k
200,000 tokens
Lightweight variant of Grok-2 from xAI with extended context for general reasoning tasks.
2024-08-01
Researched 38d ago
128k
128,000 tokens
No tracked provider route
Grok-2 is xAI's Grok 2 model with an optional reasoning mode. It offers a 128K-token context window.
2024-08-01
Researched 38d ago
128k
128,000 tokens
Amazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.
2025-03-17
Researched 3d ago
1m
1,000,000 tokens
Claude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
2024-03-04
Researched 69d ago
200k
200,000 tokens
DeepSeek R1: Reasoning-optimized model with extended thinking capabilities. 128K context.
2025-01-20
Researched 69d ago
128k
128,000 tokens
Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.
2026-02-11
Researched 69d ago
200k
200,000 tokens
Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.
2024-03-04
Researched 69d ago
200k
200,000 tokens
DeepSeek V4 Pro is DeepSeek's flagship open-weights model, released April 24 2026 under the MIT license. Architecture: 1.6T total / 49B active parameters, MoE with Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) hybrid — requiring only 27% of inference FLOPs vs standard 1M-context transformers — plus Manifold-Constrained Hyper-Connections (mHC) and Muon Optimizer. Context window: 1,000,000 tokens; max output: 384,000 tokens (Think Max mode requires >=384K context). Text-only (no vision/image input). Supports three reasoning modes: Non-Think, Think High, Think Max. Function calling, tool use, and structured outputs supported. Key benchmarks: SWE-bench Verified 80.6%, SWE-bench Pro 55.4%, LiveCodeBench 93.5%, GPQA Diamond 90.1%, MMLU-Pro 87.5%, Terminal-Bench 2.0 59.1% on BenchLM's independent June 2026 harness, and Chatbot Arena 1456 (2026-06-16). Current API pricing: $0.435/$0.87 per 1M input/output tokens; DeepSeek made the former 75% promotional rate permanent in May 2026.
2026-04-24
Researched 3d ago
1m
1,000,000 tokens
Kimi K2 Instruct is an instruction-tuned language model from Moonshot AI, available via Fireworks AI.
2025-09-05
Researched 23d ago
131k
131,072 tokens
DeepSeek R1 Distill Llama 70B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
DeepSeek R1 Distill Qwen-32B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 26d ago
128k
128,000 tokens
Alibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.
2026-05-19
Researched today
1m
1,000,000 tokens
OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.
2025-04-16
Researched 19d ago
200k
200,000 tokens
DeepSeek R1 Distill Qwen-14B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
Ring-2.6-1T is InclusionAI's MIT-licensed trillion-parameter MoE reasoning model for agent workflows, engineering tasks, scientific analysis, and enterprise automation. It supports high and xhigh reasoning effort modes and entered OpenRouter's Programming top 10 in the 2026-05-18 audit.
2026-05-08
Researched 39d ago
262k
262,144 tokens
Hermes 4's 70B hosted variant on Nous Portal. The portal describes it as a hybrid-mode reasoning model that balances scale and size while staying fast and cost effective for complex reasoning tasks.
2025-09-22
Researched 65d ago
128k
128,000 tokens
DeepSeek R1 Distill Llama 8B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
DeepSeek R1 Distill Qwen-7B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
NVIDIA's open frontier-reasoning model (550B total / 55B active MoE, hybrid Transformer-Mamba). Highest Artificial Analysis Intelligence Index for any US open model (score: 48). 300+ tokens/second. 1M-token context. Announced at Computex 2026. Pricing: ~$0.60/$2.60 per 1M tokens (provider median); free tier on some providers.
2026-06-04
Researched 16d ago
1m
1,000,000 tokens
Cosmos 3 Nano is NVIDIA's 16B-parameter omnimodel optimized for efficient inference on workstation-grade hardware (NVIDIA RTX PRO 6000). Architecture: dual-tower Mixture-of-Transformers with an 8B autoregressive Reasoner and an 8B diffusion-based Generator. The Reasoner supports up to 256K tokens of context for vision-language reasoning; the Generator produces video up to 720p at variable frame rates (default 189 frames). Natively handles text, image, video, audio (48kHz stereo), and robot action trajectories across 10+ robot embodiments including Franka Panda, UR, Google robot, and UMI. BF16 precision only. Available as open weights on Hugging Face and via the Cosmos 3 Reasoner NIM (NIM_MODEL_SIZE=nano). Intended for real-time robotics inference and edge-adjacent deployment. Robot action input/output is preserved in this description because the model schema does not have a dedicated action modality field.
2026-05-31
Researched 26d ago
256k
256,000 tokens
Cosmos 3 Super is NVIDIA's flagship 64B-parameter omnimodel for physical AI, designed for large-scale synthetic data generation and high-fidelity simulation on NVIDIA Hopper and Blackwell datacenter GPUs. Architecture: dual-tower Mixture-of-Transformers with a 32B autoregressive Reasoner and a 32B diffusion-based Generator. Supports 256K token reasoning context, 720p video generation at variable frame rates, and 10+ robot embodiment action domains. Ranked #1 among open models on Physics-IQ, PAI-Bench, R-Bench, RoboLab, RoboArena, VANTAGE-Bench, TAR, and Artificial Analysis image/video leaderboards (Computex 2026). Training data: 1.3B data points across 393 datasets (2024-2026). Inference performance (vLLM-Omni): ~55s for 50-step video on 8xH200. Available as open weights on Hugging Face and via Cosmos 3 Reasoner NIM (NIM_MODEL_SIZE=super). Robot action input/output is preserved in this description because the model schema does not have a dedicated action modality field.
2026-05-31
Researched 26d ago
256k
256,000 tokens
Perceptron Mk1 is a closed-source vision-language model for image and video understanding, OCR, object detection, captioning, video QA, and embodied reasoning. Perceptron documents Mk1 with 32K context, reasoning support, and standard pricing of $0.15 per 1M input tokens and $1.50 per 1M output tokens.
2026-05-12
Researched 39d ago
33k
32,768 tokens
CoBuddy is a Baidu Qianfan code generation model optimized for coding tasks and AI agent workflows. OpenRouter lists the free variant with a 131K context window, native tool support, reasoning support, and FP8 quantization for high-throughput inference.
2026-05-06
Researched 40d ago
131k
131,072 tokens
DeepSeek R1 Distill Qwen-1.5B is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
o1-mini (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window.
2024-09-12
Researched 39d ago
128k
128,000 tokens
Nanbeige4.1-3B is an open-source 3.93B-parameter reasoning and agentic language model by Nanbeige LLM Lab (BOSS Zhipin), released February 11, 2026. Built on Nanbeige4-3B-Base with further SFT and reinforcement learning. Supports 256K token context. A unified generalist model achieving strong reasoning, preference alignment, and agentic tool-use capabilities at the ~3B scale. Competitive with much larger open models. arxiv: 2602.13367. HuggingFace: Nanbeige/Nanbeige4.1-3B.
2026-02-11
Researched 39d ago
256k
256,000 tokens
No tracked provider route
Tencent Hunyuan T1 is a deep-thinking reasoning model released March 21, 2025. Built on Hunyuan TurboS base, using a Hybrid-Transformer-Mamba MoE architecture — the first ultra-large-scale Mamba-powered LLM with 16 total experts and 52B activated parameters via dynamic routing. 96.7% of compute allocated to reinforcement learning post-training. Benchmarks: MATH-500 96.2, LiveCodeBench 64.9, GPQA Diamond 69.3, MMLU-PRO 87.2 (second only to o1). Decoding speed 2× faster than comparable transformer models. 256K token context. Available on Tencent Cloud API. Source: https://tencent.github.io/llm.hunyuan.T1/README_EN.html
2025-03-21
Researched 56d ago
256k
256,000 tokens
No tracked provider route
DeepSeek R1 Zero is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2025-01-20
Researched 39d ago
128k
128,000 tokens
No tracked provider route
DeepSeek R1 Lite is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 128K-token context window with weights openly available for self-hosting.
2024-11-21
Researched 39d ago
128k
128,000 tokens
No tracked provider route
o1-preview (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window and scores 73.3 on GPQA.
2024-09-12
Researched 39d ago
128k
128,000 tokens
No tracked provider route
The Cerebras GPT 590M is a robust language model featuring 590 million parameters and a transformer architecture akin to GPT-3. It is optimized for natural language processing tasks such as text generation, completion, and summarization. Trained using the Chinchilla scaling laws and Cerebras' weight streaming technology, this model achieves high efficiency, offering faster training times and reduced costs. The Andromeda AI supercomputer facilitated its training on the extensive Pile dataset. Open-sourced under the Apache 2.0 license, it primarily supports English and requires additional tuning for other languages and conversational applications due to its lack of reinforcement learning from human feedback.
2023-03-13
Researched 36d ago
2k
2,000 tokens
No tracked provider route
The NeMo Megatron-GPT 5B is a transformer-based language model with 5 billion trainable parameters, inspired by models like GPT-2 and GPT-3 1. Its architecture is a decoder-only transformer, designed to sequentially process input for text generation and language understanding tasks 15. Trained on "The Piles" dataset by Eleuther.AI, it leverages its substantial dataset to produce coherent and natural-sounding text while also answering questions and completing sentences 5. Despite its strengths, the model can reflect biases and toxic language from its dataset, sometimes yielding inappropriate outputs. Evaluations on benchmarks like the LM Evaluation Test Suite showcase its varying performance, scoring 0.5566 on ARC-Easy and 0.6133 on Winogrande 1, indicating both strengths and limitations across different tasks.
2019-08-28
Researched 177d ago
—
No window data
No tracked provider route
OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.
2025-08-07
Researched 48d ago
400k
400,000 tokens
Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).
2025-08-07
Researched 48d ago
400k
400,000 tokens
Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.
2025-08-07
Researched 48d ago
400k
400,000 tokens
Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.
2026-03-01
Researched 48d ago
1.05m
1,050,000 tokens
Advanced o3 reasoning model for complex math, science, and coding problems. Supports tools, vision, and extended thinking. Available to Pro users. Released June 10, 2025.
2025-06-10
Researched 39d ago
200k
200,000 tokens
Seed 1.6 Flash is ByteDance Seed's ultra-fast multimodal thinking model supporting text and visual understanding at 256K context, optimized for low-latency inference.
2026-03-01
Researched 61d ago
256k
256,000 tokens
No tracked provider route
MAI-Thinking-1 is Microsoft AI's flagship reasoning model, built from scratch on enterprise-grade commercially licensed data without third-party distillation. The sparse mixture-of-experts model activates about 35B parameters from roughly 1T total parameters, supports a 256K-token context window, and targets frontier reasoning and software engineering work at a mid-weight price point. Microsoft reports 97% on AIME 2025, 94.5% on AIME 2026, 84.2% on GPQA Diamond, 87.7% on LiveCodeBench v6, 73.5% on SWE-bench Verified, and 52.8% on SWE-bench Pro. In a 1,276-task Surge blind side-by-side evaluation, it narrowly beat Claude Sonnet 4.6 but trailed Claude Opus 4.6. It supports function calling and developer instructions through the Chat Completions API.
2026-06-02
Researched 16d ago
256k
256,000 tokens
Seed 1.6 is a general-purpose multimodal model from ByteDance Seed supporting text, image, and video inputs. It incorporates multimodal capabilities and deep thinking for complex tasks at 256K context.
2026-03-01
Researched 61d ago
256k
256,000 tokens
Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that processes text, images, and videos at 1M token context with improved reasoning over Nova Lite v1.
2025-12-02
Researched 3d ago
1m
1,000,000 tokens
o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex multi-step research tasks by synthesizing information from multiple sources at 200K context.
2025-10-10
Researched 44d ago
200k
200,000 tokens
Open-weight dense Qwen3.6 27B model with native multimodal support across text, image, and video. Apache 2.0.
2026-04-27
Researched 38d ago
262k
262,144 tokens
Aion-1.0-Mini is a 32B parameter model distilled from DeepSeek-R1, designed for strong performance in reasoning and coding at a smaller footprint.
2026-01-01
Researched 39d ago
128k
128,000 tokens
No tracked provider route
Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.
2024-10-22
Researched 26d ago
200k
200,000 tokens
OLMo 3 32B Think is Allen Institute for AI's reasoning-focused model with extended thinking chains for complex logic problems and multi-step reasoning.
2026-03-01
Researched 61d ago
64k
64,000 tokens
No tracked provider route
Claude Sonnet 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 86 on MMLU PRO.
2025-09-29
Researched 39d ago
200k
200,000 tokens
Claude Opus 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 80.7 on MMMU.
2025-11-01
Researched 39d ago
200k
200,000 tokens
Claude 3.5 Sonnet, the latest in Anthropic's line of large language models, merges state-of-the-art reasoning, coding, and natural language understanding capabilities with advanced multi-modal processing. Released in October 2024, it excels in benchmarks against previous models and competitors, thanks to its scalable attention mechanisms and massive neural network architecture. Its dynamic routing enables specialization in various tasks, supporting applications from software development and data analysis to customer support and content creation. Users benefit from its "Artifacts" feature for real-time collaborative workflows and can access the model through platforms like Claude.ai and APIs at competitive pricing rates.
2024-06-20
Researched 69d ago
200k
200,000 tokens
Alibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.
2026-06-03
Researched 18d ago
1m
1,000,000 tokens
Anthropic's cybersecurity-focused frontier model, offered as an invitation-only research preview under Project Glasswing. Succeeded by Claude Mythos 5 (API ID: claude-mythos-5) as of June 9, 2026. Anthropic has indicated that Claude Mythos Preview will be retired after Claude Mythos 5 becomes available; no formal retirement date was published as of 2026-06-09. For current access and the migration path, see the Anthropic migration guide.
2026-05-01
Researched 18d ago
1m
1,000,000 tokens
LFM2.5-8B-A1B is Liquid AI's latest on-device mixture-of-experts model, succeeding LFM2-8B-A1B. It has 8.3B total parameters with approximately 1.5B active per token (the A1B label uses a rounded ~1B figure). The architecture combines 18 double-gated LIV convolutional layers with 6 GQA attention layers, trained on 38 trillion tokens. The context window expands to 128K tokens (up from 32K in the predecessor). It is a reasoning model that generates explicit chain-of-thought steps before producing its final answer, making reasoning tokens cheap due to the MoE design. Strong tool-calling, function-calling, and instruction-following capabilities make it well-suited for agentic workflows on edge hardware. Weights are openly available on Hugging Face under the lfm1.0 license.
2026-05-28
Researched 30d ago
128k
128,000 tokens
No tracked provider route
INTELLECT-3 is Prime Intellect's 106B-parameter MoE model with 12B active parameters, post-trained from GLM-4.5-Air-Base via SFT and reinforcement learning, matching frontier closed-model performance.
2026-04-01
Researched 39d ago
128k
128,000 tokens
No tracked provider route
Aion-1.0 is a multi-model system from AionLabs designed for high performance across reasoning and coding tasks, trained with direct distillation from frontier models.
2026-01-01
Researched 39d ago
128k
128,000 tokens
No tracked provider route
Maestro Reasoning is Arcee AI's flagship 32B analysis and reasoning model, fine-tuned with DPO from Qwen 2.5-32B for cross-domain reasoning, mathematics, and structured analysis tasks.
2025-12-01
Researched 61d ago
128k
128,000 tokens
No tracked provider route
Cogito v2.1 671B MoE is Deep Cogito's strongest open model, matching performance of frontier closed models. It features deep thinking capabilities and strong results on coding, reasoning, and math benchmarks.
2025-11-19
Researched 51d ago
128k
128,000 tokens
No tracked provider route
Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.
2026-02-17
Researched 15d ago
1m
1,000,000 tokens
Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.
2026-04-16
Researched today
1m
1,000,000 tokens
Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.
2026-02-05
Researched 39d ago
1m
1,000,000 tokens
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse MoE architecture, available for preview as part of the Qwen3.6 series.
2026-04-20
Researched 46d ago
256k
256,000 tokens
Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.
2026-05-28
Researched 3d ago
1m
1,000,000 tokens
Mistral's 128B flagship merged model, released April 29, 2026. Open weights available on HuggingFace since ~May 22, 2026 under a Modified MIT License (free for companies with <$20M/month revenue). Dense 128B architecture with 256k context window, configurable reasoning effort, vision encoder for variable image sizes, and native function calling. Achieves 91.4% on τ³-Telecom and 77.6% on SWE-Bench Verified. Strong multilingual support across 24+ languages. Replaces Mistral Medium 3.1, Magistral, and Devstral 2 as Mistral's primary merged model. Available via Mistral API at $1.50/$7.50 per million tokens (input/output) and via NVIDIA NIM/endpoints.
2026-04-29
Researched 8d ago
262k
262,144 tokens
DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts language model with 1M-token context. Features a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficient long-context inference. Supports thinking and non-thinking modes. Legacy API aliases deepseek-chat and deepseek-reasoner map to this model's non-thinking and thinking modes respectively. Pricing: $0.14/1M input, $0.28/1M output (cache hit: $0.0028/1M input). MIT licensed.
2026-04-24
Researched 14d ago
1m
1,000,000 tokens
Step 3.7 Flash is StepFun's open-weights multimodal Mixture-of-Experts model for agentic coding, tool use, long-context reasoning, image understanding, and video understanding. It combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder, activates about 11B parameters per token, supports a 256K-token context window, and exposes low, medium, and high reasoning levels for speed/depth tradeoffs. StepFun reports leading open-model results on ClawEval-1.1, SimpleVQA with Search, and SWE-bench Pro at launch. Weights are available on Hugging Face under Apache 2.0.
2026-05-29
Researched 29d ago
256k
256,000 tokens
MAI-Code-1-Flash is Microsoft AI's lightweight agentic coding model built directly inside GitHub Copilot's production harness. It is designed for fast everyday developer workflows, adaptive thinking by task complexity, multi-turn instruction following, and token-efficient coding. Microsoft reports 51.2% on SWE-bench Pro versus 35.2% for Claude Haiku 4.5 in the same Copilot harness, plus stronger results on SWE-bench Verified, SWE-bench Multilingual, and Terminal-Bench 2.0 without publishing exact scores for those secondary benchmarks.
2026-06-02
Researched 16d ago
256k
256,000 tokens
Nano Banana 2 is the GA Gemini 3.1 Flash Image model for image generation and editing through the Gemini API. It accepts text, image, PDF, and video inputs, adds video-to-image generation for thumbnails, posters, and infographics, returns text and images, supports search grounding and thinking, and replaces the gemini-3.1-flash-image-preview model retiring on 2026-06-25.
2026-05-28
Researched 22d ago
131k
131,072 tokens