LLM Reference
Concepts & capability filters
Capability filtercapabilityintermediate

Code execution

Also known as: sandboxed code, code interpreter, computer use

run code as part of a workflow

See matching models with benchmark scores and pricing.

84

matching active models

28

tracked providers

68

models with routes

model.code_execution

Definition

Code execution capability means a model route or surrounding product can run generated code, calculations, or sandboxed scripts as part of completing a task. For model selection, treat it as an execution-surface flag and still inspect the provider route before relying on it.

Models With Code execution

Showing the first 80 matches, sorted by decision relevance, with tracked capability and provider-route evidence.

84 matches
GPT-4 Vision Preview

GPT-4 Vision Preview is OpenAI's GPT-4 model with multimodal text and image input. It is deprecated (originally released 2023-11-06); use it only for reproducing earlier results or evaluating drift over time.

2023-11-06

Researched 41d ago

128k

128,000 tokens

128k contextVisionMultimodalCode exec

No tracked provider route

GLM-5.2

GLM-5.2 is Z.ai's coding-first successor to GLM-5.1 in the GLM-5 family, released June 13 2026. 753B parameters (40B active) in IndexShare MoE architecture; the IndexShare innovation reuses the same attention indexer across every four sparse layers, cutting per-token FLOPs by 2.9x at 1M context length. Trained on 28.5T tokens. Supports a 1M-token context window via the glm-5.2[1m] model ID, with 131,072-token maximum output and High/Max thinking-effort levels designed for extended agentic coding sessions. MIT license; open weights available on Hugging Face (zai-org/GLM-5.2 and zai-org/GLM-5.2-FP8). Self-reported HF card benchmarks: SWE-bench Pro 62.1, Terminal-Bench 2.1 82.7, MCP-Atlas 76.8, Tool-Decathlon 48.2, GPQA Diamond 91.2, AIME 2026 99.2, HLE 40.5. Available to GLM Coding Plan subscribers (Lite/Pro/Max/Team) directly, and via OpenRouter token API ($1.40/$4.40 per 1M tokens).

2026-06-13

Researched 5d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONCode exec
OpenRouter

$1.40 in / $4.40 out / 1M tokens

1 route

Provider docs
GLM-5.1

Post-training variant of GLM-5 from Z.ai (Zhipu AI) with enhanced agentic coding capabilities. Released April 7, 2026. 754B parameters (40B active) in Mixture of Experts architecture, 200K token context, 128K max output. Supports autonomous plan–execute–test–fix–optimize loops for up to 8 hours without human intervention. Trained entirely on Huawei Ascend hardware (no Nvidia). Key benchmarks: SWE-bench Pro 58.4 (world #1 at release, surpassing GPT-5.4 57.7 and Claude Opus 4.6 57.3), GPQA Diamond 86.2, AIME 2026 95.3, Terminal-Bench 2.0 63.5, MCP-Atlas 71.8, Chatbot Arena Elo 1475 (June 16, 2026, arena.ai). Available via Z.ai API ($1.40/$4.40 per 1M input/output tokens) and open weights on Hugging Face under MIT license.

2026-04-07

Researched 5d ago

200k

200,000 tokens

200k contextReasoningTool useFunctionsJSONCode exec
OpenRouter

$1.05 in / $3.50 out / 1M tokens

5 routes · 2 cache

Provider docs
Claude 3 Sonnet

Claude 3 Sonnet by Anthropic is a versatile large language AI model, balancing intelligence and speed for diverse enterprise use cases. It is part of the Claude 3 family, positioned between the powerful Opus and the faster Haiku models. Sonnet excels in nuanced content creation, accurate summarization, and complex scientific query handling while also showcasing proficiency in non-English languages and coding tasks. Additionally, it enhances vision capabilities with exceptional skills in visual reasoning, such as interpreting charts, graphs, and transcribing text from imperfect images, which benefits industries like retail, logistics, and finance. Operated at twice the speed of Claude 3 Opus, Sonnet is efficient in context-sensitive customer support and multi-step workflows. It has achieved AI Safety Level 2 (ASL-2) and is accessible through multiple platforms, including Claude.ai, the Claude iOS app, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

2024-03-04

Researched 71d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalJSONCode exec
AWS Bedrock

$3.00 in / $15.00 out / 1M tokens

2 routes · 1 cache

Provider docs
DeepSeek R1

DeepSeek R1: Reasoning-optimized model with extended thinking capabilities. 128K context.

2025-01-20

Researched 71d ago

128k

128,000 tokens

128k contextReasoningJSONCode exec
Bitdeer AI

$0.100 in / $0.300 out / 1M tokens

14 routes

Provider docs
Qwen2.5-Coder-32B-Instruct

Instruction-optimized 32B code flagship for production systems requiring top-tier code reasoning, generation, and multi-file analysis.

2024-11-12

Researched 41d ago

128k

128,000 tokens

128k contextJSONCode exec
SiliconFlow

$0.180 in / $0.180 out / 1M tokens

6 routes

Provider docs
Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.

2024-03-04

Researched 71d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions
AWS Bedrock

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch

Provider docs
Qwen3.7-Max

Alibaba's closed-weight flagship language model, announced at the 2026 Alibaba Cloud Summit (May 20). Scored 56.6 on Artificial Analysis Intelligence Index at launch—highest-ranked Chinese model. 1M-token context with prompt caching (up to 90% discount). Pricing: $2.50/$7.50 per 1M tokens in/out.

2026-05-19

Researched 2d ago

1m

1,000,000 tokens

1m contextReasoningTool useFunctionsJSONCode exec
Novita AI

$1.25 in / $3.75 out / 1M tokens

4 routes · 3 cache

Provider docs
o3

OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.

2025-04-16

Researched 21d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$2.00 in / $8.00 out / 1M tokens

3 routes · 1 batch · 2 cache

Provider docs
Qwen2.5-Coder-32B

32B flagship code specialist matching GPT-4o performance with SOTA multi-language repair (75.2% on MdEval) and 3.7% improvement on repo-wide context benchmarks.

2024-11-12

Researched 41d ago

128k

128,000 tokens

128k contextJSONCode exec
DeepInfra

$0.200 in / $0.200 out / 1M tokens

2 routes

Provider docs
o1-mini (09-12)

o1-mini (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window.

2024-09-12

Researched 41d ago

128k

128,000 tokens

128k contextReasoningCode exec
Replicate API

$1.10 in / $4.40 out / 1M tokens

1 route

Provider docs
GPT-4o Audio Preview (12-17)

GPT-4o Audio Preview (12-17) is OpenAI's GPT-4o Audio model. It offers a 128K-token context window.

2024-12-17

Researched 41d ago

128k

128,000 tokens

128k contextVisionAudioCode exec

No tracked provider route

GPT-4o (11-20)

GPT-4o (11-20) is OpenAI's GPT-4o model. It offers a 128K-token context window.

2024-11-20

Researched 41d ago

128k

128,000 tokens

128k contextVisionCode exec

No tracked provider route

GPT-4o Audio Preview (10-01)

GPT-4o model with integrated audio I/O capabilities for multimodal interactions.

2024-10-01

Researched 179d ago

128k

128,000 tokens

128k contextVisionAudioCode exec

No tracked provider route

o1-preview (09-12)

o1-preview (09-12) is OpenAI's o1 model with an optional reasoning mode. It offers a 128K-token context window and scores 73.3 on GPQA.

2024-09-12

Researched 41d ago

128k

128,000 tokens

128k contextReasoningCode exec

No tracked provider route

ChatGPT-4o

The chatgpt-4o-latest model version continuously points to the version of GPT-4o used in ChatGPT, and is updated frequently, when there are significant changes.

2024-05-13

Researched 179d ago

128k

128,000 tokens

128k contextVisionCode exec

No tracked provider route

Cerebras GPT 590M

The Cerebras GPT 590M is a robust language model featuring 590 million parameters and a transformer architecture akin to GPT-3. It is optimized for natural language processing tasks such as text generation, completion, and summarization. Trained using the Chinchilla scaling laws and Cerebras' weight streaming technology, this model achieves high efficiency, offering faster training times and reduced costs. The Andromeda AI supercomputer facilitated its training on the extensive Pile dataset. Open-sourced under the Apache 2.0 license, it primarily supports English and requires additional tuning for other languages and conversational applications due to its lack of reinforcement learning from human feedback.

2023-03-13

Researched 38d ago

2k

2,000 tokens

ReasoningCode exec

No tracked provider route

Megatron GPT 5B

The NeMo Megatron-GPT 5B is a transformer-based language model with 5 billion trainable parameters, inspired by models like GPT-2 and GPT-3 1. Its architecture is a decoder-only transformer, designed to sequentially process input for text generation and language understanding tasks 15. Trained on "The Piles" dataset by Eleuther.AI, it leverages its substantial dataset to produce coherent and natural-sounding text while also answering questions and completing sentences 5. Despite its strengths, the model can reflect biases and toxic language from its dataset, sometimes yielding inappropriate outputs. Evaluations on benchmarks like the LM Evaluation Test Suite showcase its varying performance, scoring 0.5566 on ARC-Easy and 0.6133 on Winogrande 1, indicating both strengths and limitations across different tasks.

2019-08-28

Researched 179d ago

No window data

ReasoningCode exec

No tracked provider route

GPT-5

OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.

2025-08-07

Researched 50d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$1.25 in / $10.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-5 Mini

Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).

2025-08-07

Researched 50d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$0.250 in / $2.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-5 Pro

GPT-5 Pro is OpenAI's most advanced GPT-5 tier, offering major improvements in reasoning, code quality, and user experience for enterprise and power-user applications at 400K context.

2025-10-01

Researched 63d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON
OpenRouter

$15.00 in / $120.00 out / 1M tokens

2 routes

Provider docs
Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.

2025-12-17

Researched 43d ago

1m

1,000,000 tokens

1m contextVisionMultimodalAudioTool useFunctions
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

4 routes · 1 cache

Provider docs
GPT-5 Nano

Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.

2025-08-07

Researched 50d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$0.050 in / $0.400 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-5.4 Pro

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 50d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

Provider docs
Gemini 3 Pro

Google DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.

2025-12-11

Researched 179d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsCode exec
GCP Vertex AI

$1.25 in / $5.00 out / 1M tokens

2 routes

Provider docs
GPT-5.1 Codex

GPT-5.1-Codex is a coding-specialized version of GPT-5.1, optimized for software engineering and agentic coding workflows at 400K context.

2025-12-01

Researched 63d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON
OpenRouter

$1.25 in / $10.00 out / 1M tokens

2 routes · 2 cache

Provider docs
GPT-5 Codex

GPT-5 Codex is OpenAI's coding-specialized variant of GPT-5, optimized for software engineering workflows, code generation, and agentic coding tasks at 400K context.

2025-10-01

Researched 63d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON
OpenRouter

$1.25 in / $10.00 out / 1M tokens

2 routes · 2 cache

Provider docs
Gemini 3 Flash Preview

Frontier-class performance rivaling larger models at a fraction of the cost. Most intelligent Gemini model built for speed, combining frontier intelligence with superior search and grounding. $0.50 input / $3.00 output per 1M tokens.

2025-12-17

Researched 71d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.500 in / $3.00 out / 1M tokens

3 routes

Provider docs
o3-pro

Advanced o3 reasoning model for complex math, science, and coding problems. Supports tools, vision, and extended thinking. Available to Pro users. Released June 10, 2025.

2025-06-10

Researched 41d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$20.00 in / $80.00 out / 1M tokens

3 routes

Provider docs
GPT-4.1

OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.

2025-04-01

Researched 50d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.00 in / $8.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-4.1 Mini

Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.

2025-04-01

Researched 50d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$0.400 in / $1.60 out / 1M tokens

4 routes · 2 cache

Provider docs
KAT Coder Pro V2

KAT-Coder-Pro V2 is Kwaipilot's flagship agentic coding model, achieving 79.6% on SWE-Bench Verified (March 2026). Designed for complex enterprise software engineering tasks, multi-system coordination, and SaaS integration. Uses a 'Specialize-then-Unify' training paradigm with five specialized expert domains. Context: 256K tokens. Max output: 256K tokens (on Streamlake endpoint). Available via Vercel AI Gateway and OpenRouter.

2026-03-27

Researched 38d ago

256k

256,000 tokens

256k contextTool useFunctionsJSONCode execPrompt cache
Novita AI

$0.300 in / $1.20 out / 1M tokens

3 routes · 1 cache

Provider docs
Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.

2024-10-22

Researched 28d ago

200k

200,000 tokens

200k contextReasoningVisionJSONCode execBatch
Anthropic

$0.800 in / $4.00 out / 1M tokens

6 routes · 1 batch · 2 cache

Provider docs
Morph V3 Fast

Morph V3 Fast is Morph's fastest code apply model at ~10,500 tokens/sec with 96% accuracy, optimized for rapid code transformations in AI coding workflows.

2026-03-01

Researched 41d ago

80k

80,000 tokens

Code exec
OpenRouter

$0.800 in / $1.20 out / 1M tokens

2 routes

Provider docs
Relace Apply 3

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits directly into source files at 256K context, designed for precise apply operations in AI coding agents.

2026-01-01

Researched 41d ago

256k

256,000 tokens

256k contextCode exec

No tracked provider route

DeepSeek V3.1

Enhanced reasoning and grounded retrieval model from DeepSeek with multimodal text and image understanding.

2025-08-21

Researched 38d ago

64k

64,000 tokens

VisionMultimodalJSONCode execPrompt cache
Novita AI

$0.270 in / $1.00 out / 1M tokens

8 routes · 1 cache

Provider docs
Claude Opus 4.5

Claude Opus 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 80.7 on MMMU.

2025-11-01

Researched 41d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 2 cache

Provider docs
Claude 3.5 Sonnet

Claude 3.5 Sonnet, the latest in Anthropic's line of large language models, merges state-of-the-art reasoning, coding, and natural language understanding capabilities with advanced multi-modal processing. Released in October 2024, it excels in benchmarks against previous models and competitors, thanks to its scalable attention mechanisms and massive neural network architecture. Its dynamic routing enables specialization in various tasks, supporting applications from software development and data analysis to customer support and content creation. Users benefit from its "Artifacts" feature for real-time collaborative workflows and can access the model through platforms like Claude.ai and APIs at competitive pricing rates.

2024-06-20

Researched 71d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalFunctionsJSON
Anthropic

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 cache

Provider docs
GPT-4o

OpenAI GPT-4o: Flagship multimodal model with vision, function calling, and broad capability. $2.50/M input, $10/M output.

2024-05-13

Researched 50d ago

128k

128,000 tokens

128k contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.50 in / $10.00 out / 1M tokens

5 routes · 1 batch · 2 cache

Provider docs
Qwen3.7-Plus

Alibaba's multimodal agentic model with text, image, and video input. Combines vision-language understanding with full agentic capabilities: deep reasoning, self-programming, tool invocation, and autonomous iteration. GUI grounding: 79.0 on ScreenSpot Pro. Max output 66K tokens. Pricing: $0.40/$1.60 per 1M tokens in/out.

2026-06-03

Researched 20d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
OpenRouter

$0.320 in / $1.28 out / 1M tokens

2 routes · 1 cache

Provider docs
Claude Mythos Preview

Anthropic's cybersecurity-focused frontier model, offered as an invitation-only research preview under Project Glasswing. Succeeded by Claude Mythos 5 (API ID: claude-mythos-5) as of June 9, 2026. Anthropic has indicated that Claude Mythos Preview will be retired after Claude Mythos 5 becomes available; no formal retirement date was published as of 2026-06-09. For current access and the migration path, see the Anthropic migration guide.

2026-05-01

Researched 20d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$25.00 in / $125.00 out / 1M tokens

2 routes

Provider docs
Morph V3 Large

Morph V3 Large is Morph's high-accuracy code apply model, achieving ~98% accuracy for precise code transformations at ~4,500 tokens/sec and 256K context.

2026-03-01

Researched 41d ago

256k

256,000 tokens

256k contextCode exec
OpenRouter

$0.900 in / $1.90 out / 1M tokens

2 routes

Provider docs
Relace Search

Relace Search uses parallel file view and grep tools to explore a codebase and return relevant file sections with 256K context, specialized for AI coding agent pipelines.

2026-01-01

Researched 41d ago

256k

256,000 tokens

256k contextTool useCode exec

No tracked provider route

Arcee Coder Large

Coder Large is Arcee AI's 32B code-focused model, trained on permissively-licensed GitHub repositories and fine-tuned from Qwen 2.5-Instruct for software engineering tasks.

2025-12-01

Researched 63d ago

32k

32,000 tokens

Tool useFunctionsJSONCode exec

No tracked provider route

Cogito v2.1 671B

Cogito v2.1 671B MoE is Deep Cogito's strongest open model, matching performance of frontier closed models. It features deep thinking capabilities and strong results on coding, reasoning, and math benchmarks.

2025-11-19

Researched 53d ago

128k

128,000 tokens

128k contextReasoningTool useFunctionsJSONCode exec

No tracked provider route

Mistral Medium 3

Mistral Medium 3 is Mistral AI's enterprise-grade model delivering frontier-level capabilities including vision, function calling, and code generation at competitive cost for business applications.

2025-05-01

Researched 63d ago

128k

128,000 tokens

128k contextVisionMultimodalTool useFunctionsJSON

No tracked provider route

Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 17d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched 2d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Provider docs
Claude Opus 4.6

Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.

2026-02-05

Researched 41d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 4 cache

Provider docs
Claude Opus 4.8

Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.

2026-05-28

Researched 5d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 1 cache

Provider docs
GLM 4.7

GLM-4.7 is Z.ai's flagship text model featuring enhanced programming capabilities and deeper reasoning at 200K context, succeeding GLM-4.6.

2026-03-01

Researched 38d ago

200k

200,000 tokens

200k contextTool useFunctionsJSONCode execPrompt cache
Fireworks AI

$0.600 in / $2.20 out / 1M tokens

3 routes · 1 cache

Provider docs
Mistral Small 3.2 24B

Mistral Small 3.2 24B is an updated instruction-tuned model from Mistral optimized for function calling, structured outputs, and vision tasks at 128K context with open weights.

2025-06-01

Researched 63d ago

128k

128,000 tokens

128k contextVisionMultimodalTool useFunctionsJSON
Venice AI

Pricing not tracked / 1M tokens

1 route

Provider docs
DeepSeek V3.2

DeepSeek V3.2 is DeepSeek's DeepSeek V3 model. It offers a 160K-token context window with weights openly available for self-hosting and scores 70 on SWE-bench Verified.

2025-12-01

Researched 40d ago

160k

160,000 tokens

160k contextJSONCode execPrompt cache
OpenRouter

$0.252 in / $0.378 out / 1M tokens

7 routes · 1 cache

Provider docs
DeepSeek R1 0528

DeepSeek R1 0528 is DeepSeek's DeepSeek R1 model with an optional reasoning mode. It offers a 130K-token context window with weights openly available for self-hosting and scores 81 on GPQA.

2025-05-28

Researched 39d ago

130k

130,000 tokens

130k contextReasoningJSONCode execPrompt cache
Fireworks AI

$0.560 in / $1.68 out / 1M tokens

7 routes · 2 cache

Provider docs
Qwen3-Coder-480B-A35B-Instruct

Qwen3-Coder-480B-A35B-Instruct is Alibaba's flagship open-source code generation and agentic model, released July 22, 2025 under the Apache 2.0 license. The model has 480 billion total parameters with 35 billion active parameters per token, organized across 62 transformer layers with 160 specialized expert networks and 8 experts activated per token. It uses Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads and supports a native context window of 262,144 tokens, extendable to 1 million tokens via YaRN position scaling. The model is purpose-built for software engineering tasks and agentic workflows: code generation, code review, test writing, multi-step debugging, and browser-based agentic task execution. On release, it achieved state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks, with performance comparable to Claude Sonnet 4 on these tasks. Available via Fireworks AI, Google Vertex AI, NVIDIA NIM, AWS Bedrock, Novita AI, and the Vercel AI Gateway.

2025-07-22

Researched 10d ago

262k

262,144 tokens

262k contextTool useFunctionsJSONCode execPrompt cache
Novita AI

$0.380 in / $1.55 out / 1M tokens

6 routes · 1 cache

Provider docs
Gemini 3.1 Pro Preview

Google: Gemini 3.1 Pro Preview available via OpenRouter. Pricing: $2/1M input, $12/1M output.

2026-02-19

Researched 10d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$2.00 in / $12.00 out / 1M tokens

5 routes · 1 cache

Provider docs
Gemini 2.5 Flash

Google: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.

2025-06-17

Researched 71d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.300 in / $2.50 out / 1M tokens

5 routes · 1 cache

Provider docs
Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.

2026-05-19

Researched 17d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalAudioTool use
GCP Vertex AI

$1.50 in / $9.00 out / 1M tokens

4 routes · 2 batch · 3 cache

Provider docs
Gemini 2.5 Flash Lite

Google: Gemini 2.5 Flash Lite available via OpenRouter. Pricing: $0.1/1M input, $0.4/1M output.

2025-07-22

Researched 71d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON
GCP Vertex AI

$0.100 in / $0.400 out / 1M tokens

4 routes · 1 cache

Provider docs
Gemini 2.5 Pro

Google DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.

2025-06-17

Researched 24d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
GCP Vertex AI

$1.25 in / $10.00 out / 1M tokens

4 routes · 2 batch · 3 cache

Provider docs
Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is Google's generally available low-latency Gemini 3.1 model, launched May 7, 2026. It is optimized for high-volume, cost-sensitive workloads with text, image, and video inputs, a 1M token context window, and a 66K token maximum output. The GA model uses the stable API ID gemini-3.1-flash-lite and replaces gemini-3.1-flash-lite-preview, which is scheduled to shut down on May 25, 2026. Pricing is $0.25 per 1M input tokens and $1.50 per 1M output tokens.

2026-05-07

Researched 10d ago

1.05m

1,048,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON
Google AI Studio

$0.250 in / $1.50 out / 1M tokens

3 routes · 1 cache

Provider docs
DeepSeek V3.2 Exp

DeepSeek: DeepSeek V3.2 Exp available via OpenRouter. Pricing: $0.27/1M input, $0.41/1M output.

2025-04-10

Researched 38d ago

164k

164,000 tokens

164k contextJSONCode exec
Novita AI

$0.270 in / $0.410 out / 1M tokens

3 routes

Provider docs
Antigravity Agent

Antigravity Agent is Google DeepMind's preview managed agent for autonomous coding and browsing workflows. Powered by Gemini 3.5 Flash, it plans, reasons, runs code, manages files, and browses the web inside a secure Google-hosted Linux sandbox through the Interactions API. It accepts text and image input, has a 1,048,576-token input context window that compacts at about 135K tokens, and supports a 65,536-token output limit. Environment compute is not billed during preview; Google describes pricing as pay-as-you-go based on underlying Gemini model tokens and tool use.

2026-05-19

Researched 35d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalTool useCode exec
Google AI Studio

Pricing not tracked / 1M tokens

1 route

Provider docs
Gemini 3.1 Flash-Lite

GA release of Google's most cost-efficient Gemini 3.1 model, optimized for speed, scale, and cost efficiency. Supersedes gemini-3.1-flash-lite-preview. API model ID: gemini-3.1-flash-lite. Pricing: $0.25/$1.50 per 1M tokens in/out.

2026-05-07

Researched 20d ago

1m

1,000,000 tokens

1m contextVisionMultimodalTool useFunctionsJSON

No tracked provider route

GPT-5.4 Nano

GPT-5.4 Nano is the smallest and fastest variant in the GPT-5.4 family, optimized for edge deployment and low-latency tasks. Model ID: gpt-5.4-nano.

2026-03-05

Researched 25d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$0.200 in / $1.25 out / 1M tokens

3 routes · 1 batch · 3 cache

Provider docs
GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.

2026-04-23

Researched 3d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

Provider docs
GPT-5.4 Mini

GPT-5.4 Mini is a smaller, cost-efficient variant of GPT-5.4 with a 400K token context window. Designed for tasks requiring long-context processing at lower cost. Model ID: gpt-5.4-mini.

2026-03-05

Researched 16d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$0.750 in / $4.50 out / 1M tokens

3 routes · 1 batch · 3 cache

Provider docs
GPT-5.5

GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.

2026-04-23

Researched 16d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$5.00 in / $30.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Provider docs
GPT-5.4

GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

2026-03-05

Researched 16d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$2.50 in / $15.00 out / 1M tokens

4 routes · 1 batch · 3 cache

Provider docs
MiniMax M3

MiniMax M3 is MiniMax's current API flagship, released June 1, 2026 with MiniMax Sparse Attention (MSA) architecture for economical 1M-token context. It accepts text, image, and video input, supports reasoning, tool use, function calling, native prompt caching, and up to 131,072 output tokens in the tracked API configuration. MiniMax lists the standard <=512K tier at a permanent $0.30/M input and $1.20/M output; >512K long-context service remains limited availability at higher rates. Open-weight model weights are available on Hugging Face as MiniMaxAI/MiniMax-M3 under the MiniMax Community License.

2026-06-01

Researched 4d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
MiniMax

$0.300 in / $1.20 out / 1M tokens

2 routes · 1 cache

Provider docs
GPT-5.6 Sol

OpenAI's flagship GPT-5.6 model and highest-capability tier in the Sol, Terra, and Luna naming system. GPT-5.6 Sol is built for demanding reasoning, long-horizon coding, agentic workflows, and cybersecurity tasks, introducing max reasoning effort and ultra multi-agent mode. Announced June 26, 2026; available only to select trusted partners in limited preview, with broad availability pending.

2026-06-26

Researched 3d ago

No window data

ReasoningVisionMultimodalTool useFunctionsCode exec
OpenAI API

$5.00 in / $30.00 out / 1M tokens

1 route · 1 cache

Provider docs
GPT-5.3-Codex

Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).

2026-02-05

Researched 19d ago

400k

400,000 tokens

400k contextReasoningVisionTool useFunctionsJSON
OpenAI API

$1.75 in / $14.00 out / 1M tokens

3 routes · 2 cache

Provider docs
Claude Mythos 5

Anthropic's access-gated frontier model for approved Project Glasswing cybersecurity defenders and biomedical research organizations. Shares the same underlying architecture as Claude Fable 5 but operates with safety classifiers lifted in specific domains: cybersecurity safeguards are removed for all Glasswing participants, while biology safeguards are additionally removed for approved biology-track participants. Succeeds Claude Mythos Preview with significantly reduced pricing ($10/$50 per MTok input/output vs. $25/$125 for Mythos Preview), a 1M-token context window, 128k max output tokens, adaptive thinking always on (raw chain of thought never returned), vision, tool use, structured outputs, and the effort parameter for controlling thinking depth. Extended thinking with manual budget_tokens is not supported. Anthropic disabled Mythos 5 access for all customers on June 12, 2026 after a US export control directive. On June 27, 2026, the US Commerce Department partially lifted the restriction, permitting Mythos 5 deployment to approximately 100+ US organizations listed in government Annex A that operate and defend critical infrastructure. The model remains inaccessible to general commercial API customers as of June 28, 2026.

2026-06-09

Researched 1d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions
Anthropic

$10.00 in / $50.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input. It offers a 200K-token context window and scores 73.3 on SWE-bench Verified.

2025-10-01

Researched 35d ago

200k

200,000 tokens

200k contextVisionMultimodalTool useFunctionsJSON
AWS Bedrock

$0.800 in / $4.00 out / 1M tokens

8 routes · 1 batch · 2 cache

Provider docs
Qwen3-Coder-Next

Qwen3-Coder-Next is an ultra-sparse Mixture-of-Experts coding agent model from Alibaba's Qwen team, released February 3, 2026 under Apache 2.0. It has 80B total parameters with 3B active at inference, delivering substantially higher throughput than comparable dense models. It supports a native 256K context window, function calling, structured outputs, Claude Code, Qwen Code, Cline, Kilo, and other scaffold templates. Benchmarks reported in the DAT-3724 datapack include SWE-Bench Pro 44.3%, SWE-Bench Resolved 70.6%, and TerminalBench 2 36.2%.

2026-02-03

Researched 10d ago

256k

256,144 tokens

256k contextReasoningTool useFunctionsJSONCode exec
OpenRouter

$0.120 in / $0.800 out / 1M tokens

4 routes

Provider docs
Qwen3-Coder-30B-A3B-Instruct

Qwen3-Coder-30B-A3B-Instruct is Alibaba's efficient open-source code generation model in the Qwen3-Coder family, released December 3, 2025 under the Apache 2.0 license. The model has 30.5 billion total parameters with 3.3 billion active per forward pass, organized across 48 transformer layers with 128 experts and 8 activated per token. It uses Grouped Query Attention (GQA) with 32 query heads and 4 key-value heads. Native context window is 262,144 tokens, extendable to 1 million tokens via YaRN. The model supports multi-turn tool calling, function calling, repository-level code understanding, and structured outputs. It is compatible with vLLM, SGLang, Ollama, LM Studio, llama.cpp, and HuggingFace Transformers. Available via AWS Bedrock, Novita AI, and Vercel AI Gateway.

2025-12-03

Researched 10d ago

262k

262,144 tokens

262k contextTool useFunctionsJSONCode exec
Novita AI

$0.070 in / $0.270 out / 1M tokens

3 routes

Provider docs
Devstral Small 2

Devstral Small 2 is Mistral AI's 24B open-weights coding agent model, released December 9, 2025 under Apache 2.0. It scores 68.0% on SWE-bench Verified and supports agentic software engineering tasks, multi-step reasoning, and tool use. Runs on a single RTX 4090 GPU or a Mac with 32GB RAM. Multimodal, with support for image inputs.

2025-12-09

Researched 58d ago

256k

256,000 tokens

256k contextVisionMultimodalTool useFunctionsJSON
Mistral AI Studio

$0.100 in / $0.300 out / 1M tokens

2 routes

Provider docs
Mistral Devstral 2 123B

Mistral Devstral 2 123B is MistralAI's Devstral model focused on code generation and software engineering. It was released 2025-12-01.

2025-12-01

Researched 38d ago

No window data

JSONCode exec
AWS Bedrock

$0.400 in / $2.00 out / 1M tokens

2 routes

Provider docs
Composer 2.5

Cursor's agentic coding model released May 18, 2026. Built on Moonshot AI's Kimi K2.5 open-source checkpoint with targeted RL using textual feedback and 25× more synthetic training tasks than Composer 2. Designed for long-horizon software engineering tasks: multi-file edits, terminal command execution, codebase-wide semantic search, and autonomous task planning. Uses Cursor's compaction-in-the-loop context management for long coding sessions. Available on all Cursor plans; accessed through the Cursor IDE (not a standalone API). Standard pricing: $0.50/M input, $2.50/M output; Fast (default): $3.00/M input, $15.00/M output.

2026-05-18

Researched 39d ago

1m

1,000,000 tokens

1m contextTool useFunctionsCode exec
Cursor

$0.500 in / $2.50 out / 1M tokens

1 route

Provider docs