LLM ReferenceLLM Reference

Best LLMs for Function Calling (2026)

Last refreshed 2026-05-18. Next refresh: weekly.

Top AI models for agentic workflows, tool use, and API integration. Compare models with native function calling support and structured outputs.

Top three picks

Opinionated short stack for this category — scroll for the full leaderboard, pricing, and compare links.

How we rank

Tool-use leaders rank on BFCL first; when Berkeley coverage lags new SKUs, we fall back to τ-bench so fresh agentic models still surface.

  1. EligibilityModels with `function_calling` or `tool_use` enabled in seed metadata.
  2. Primary rankingBFCL score if present, otherwise τ-bench, then newer release.
  3. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  4. PricingLowest tracked input(output) for API routes.
#ModelInput $/1MOutput $/1M
1Claude Mythos Preview
PreviewReasoningVisionTools

Signal used: τ-bench 89.2%

2Claude Sonnet 4.6
ReasoningVisionTools

Signal used: τ-bench 87.5%

$3.00$15.00
3GLM-5
ReasoningTools

Signal used: τ-bench 82.1%

$0.60$2.08
4GPT-5.4
ReasoningTools

Signal used: τ-bench 78.3%

$2.50$15.00
5GPT-5.3-Codex
ReasoningVisionTools

Signal used: τ-bench 77.8%

$1.75$14.00
6Claude Opus 4.5
ReasoningVisionTools

Signal used: BFCL 77.47%

$5.00$25.00
7Qwen3-Max
VisionTools

Signal used: τ-bench 76.8%

$0.78$3.90
8Gemini 3.1 Pro Preview
PreviewVisionTools

Signal used: τ-bench 76.5%

$2.00$12.00
9GPT-5.2
ReasoningVisionTools

Signal used: τ-bench 75.1%

$1.75$14.00
10Claude Sonnet 4.5
ReasoningVisionTools

Signal used: BFCL 73.24%

$3.00$15.00
11Qwen3.5-397B-A17B
ReasoningTools

Signal used: BFCL 72.9%

$0.39$2.34
12Gemini 3 Pro
VisionTools

Signal used: BFCL 72.51%

$1.25$5.00
13Gemini 3 Flash
PreviewVisionTools

Signal used: τ-bench 71.5%

$0.50$3.00
14Claude Haiku 4.5
VisionTools

Signal used: BFCL 68.7%

$0.80$4.00
15Kimi K2.5
Tools

Signal used: BFCL 68.3%

$0.44$2.00
16Gemini 2.5 Flash
VisionTools

Signal used: BFCL 56.24%

$0.30$2.50
17GPT-5 Mini
ReasoningVisionTools

Signal used: BFCL 55.46%

$0.25$2.00
18GPT-4.1
VisionTools

Signal used: BFCL 53.96%

$2.00$8.00
19GPT-4.1 Mini
VisionTools

Signal used: BFCL 50.45%

$0.40$1.60
20Mistral Large 2
VisionTools

Signal used: BFCL 38.37%

$0.48$1.50

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).

    77.8%

    τ-bench

  • Claude Opus 4.5 available on AWS Bedrock

    77.47%

    BFCL

  • Alibaba's Qwen3-Max, flagship model with improved multilingual and reasoning capabilities.

    76.8%

    τ-bench