Claude Sonnet 4.6
- τ-bench
- 87.5%
- Output (from)
- $15.00 / 1M
Last refreshed 2026-05-18. Next refresh: weekly.
Top AI models for agentic workflows, tool use, and API integration. Compare models with native function calling support and structured outputs.
Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.
Tool-use leaders rank on BFCL first; when Berkeley coverage lags new SKUs, we fall back to τ-bench so fresh agentic models still surface.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Claude Mythos Preview PreviewReasoningVisionTools Signal used: τ-bench 89.2% | — | — | |
| 2 | Claude Sonnet 4.6 ReasoningVisionTools Signal used: τ-bench 87.5% | $3.00 | $15.00 | |
| 3 | GLM-5 ReasoningTools Signal used: τ-bench 82.1% | $0.60 | $2.08 | |
| 4 | GPT-5.4 ReasoningTools Signal used: τ-bench 78.3% | $2.50 | $15.00 | |
| 5 | GPT-5.3-Codex ReasoningVisionTools Signal used: τ-bench 77.8% | $1.75 | $14.00 | |
| 6 | Claude Opus 4.5 ReasoningVisionTools Signal used: BFCL 77.47% | $5.00 | $25.00 | |
| 7 | Qwen3-Max VisionTools Signal used: τ-bench 76.8% | $0.78 | $3.90 | |
| 8 | Gemini 3.1 Pro Preview PreviewVisionTools Signal used: τ-bench 76.5% | $2.00 | $12.00 | |
| 9 | GPT-5.2 ReasoningVisionTools Signal used: τ-bench 75.1% | $1.75 | $14.00 | |
| 10 | Claude Sonnet 4.5 ReasoningVisionTools Signal used: BFCL 73.24% | $3.00 | $15.00 | |
| 11 | Qwen3.5-397B-A17B ReasoningTools Signal used: BFCL 72.9% | $0.39 | $2.34 | |
| 12 | Gemini 3 Pro VisionTools Signal used: BFCL 72.51% | $1.25 | $5.00 | |
| 13 | Gemini 3 Flash PreviewVisionTools Signal used: τ-bench 71.5% | $0.50 | $3.00 | |
| 14 | Claude Haiku 4.5 VisionTools Signal used: BFCL 68.7% | $0.80 | $4.00 | |
| 15 | Kimi K2.5 Tools Signal used: BFCL 68.3% | $0.44 | $2.00 | |
| 16 | Gemini 2.5 Flash VisionTools Signal used: BFCL 56.24% | $0.30 | $2.50 | |
| 17 | GPT-5 Mini ReasoningVisionTools Signal used: BFCL 55.46% | $0.25 | $2.00 | |
| 18 | GPT-4.1 VisionTools Signal used: BFCL 53.96% | $2.00 | $8.00 | |
| 19 | GPT-4.1 Mini VisionTools Signal used: BFCL 50.45% | $0.40 | $1.60 | |
| 20 | Mistral Large 2 VisionTools Signal used: BFCL 38.37% | $0.48 | $1.50 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).
77.8%
τ-bench
Claude Opus 4.5 available on AWS Bedrock
77.47%
BFCL
Alibaba's Qwen3-Max, flagship model with improved multilingual and reasoning capabilities.
76.8%
τ-bench