Eligibility — Models with `function_calling` or `tool_use` enabled in seed metadata.
Primary ranking — BFCL score if present, otherwise τ-bench, then newer release.
Variant collapse — We keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
Pricing — Lowest tracked input(output) for API routes.

BFCL τ-bench

New models awaiting benchmark coverage

These source-backed rows qualify for this task page, but they are not scored leaderboard picks until the category benchmark data exists.

Model	Why it is listed	Status	Tracked price
Claude Sonnet 5 ToolsCode execution	Claude Sonnet 5 reports tool-use capability; keep it separate from the scored tool-use ranking until benchmarks land.	Benchmark pending No tracked BFCL or tau-bench score yet.	In $2.00 / Out $10.00
LongCat-2.0 Tools	LongCat-2.0 reports tool-use capability; keep it separate from the scored tool-use ranking until benchmarks land.	Benchmark pending No tracked BFCL or tau-bench score yet.	In $0.30 / Out $1.20
Fugu Ultra Tools	Fugu Ultra reports tool-use capability; keep it separate from the scored tool-use ranking until benchmarks land.	Benchmark pending No tracked BFCL or tau-bench score yet.	In $5.00 / Out $30.00
Kimi K2.7-Code HighSpeed Tools	Kimi K2.7-Code HighSpeed reports tool-use capability; keep it separate from the scored tool-use ranking until benchmarks land.	Benchmark pending No tracked BFCL or tau-bench score yet.	In $1.90 / Out $8.00

#	Model	Signal used	Context	Input $/1M	Output $/1M
1	Ring-2.6-1T ReasoningTools Signal used: τ-bench 95.32%	τ-bench 95.32%	262k	$0.07	$0.63
2	Mistral Medium 3.5 ReasoningVisionTools Signal used: τ-bench 91.4%	τ-bench 91.4%	262k	$1.50	$7.50
3	ByteDance Doubao Seed 2.0 Pro VisionTools Signal used: τ-bench 90.4%	τ-bench 90.4%	256k	$0.47	$2.37
4	Claude Mythos Preview Invite-onlyReasoningVisionTools Signal used: τ-bench 89.2%	τ-bench 89.2%	1m	—	—
5	LFM2.5 8B A1B ReasoningTools Signal used: τ-bench 88.07%	τ-bench 88.07%	128k	—	—
6	Claude Sonnet 4.6 ReasoningVisionTools Signal used: τ-bench 87.5%	τ-bench 87.5%	1m	$3.00	$15.00
7	Command A+ ReasoningVisionTools Signal used: τ-bench 85%	τ-bench 85%	128k	—	—
8	Claude Opus 4.6 ReasoningVisionTools Signal used: τ-bench 84.8%	τ-bench 84.8%	1m	$5.00	$25.00
9	GLM-5 ReasoningTools Signal used: τ-bench 82.1%	τ-bench 82.1%	200k	$0.60	$2.08
10	Qwen3.5-35B-A3B ReasoningTools Signal used: τ-bench 81.2%	τ-bench 81.2%	262k	$0.14	$1.00
11	Qwen3.5-122B-A10B ReasoningVisionTools Signal used: τ-bench 79.5%	τ-bench 79.5%	262k	$0.26	$2.08
12	Qwen3.5-9B VisionTools Signal used: τ-bench 79.1%	τ-bench 79.1%	262k	$0.10	$0.15
13	Qwen3.5-27B ReasoningVisionTools Signal used: τ-bench 79%	τ-bench 79%	262k	$0.20	$1.56
14	Grok 4.20 ReasoningVisionTools Signal used: τ-bench 78.9%	τ-bench 78.9%	1m	$1.25	$2.50
15	GPT-5.4 ReasoningVisionTools Signal used: τ-bench 78.3%	τ-bench 78.3%	1.05m	$2.50	$15.00
16	GPT-5.3-Codex ReasoningVisionTools Signal used: τ-bench 77.8%	τ-bench 77.8%	400k	$1.75	$14.00
17	Claude Opus 4.5 ReasoningVisionTools Signal used: BFCL 77.47%	BFCL 77.47%	200k	$5.00	$25.00
18	Qwen3.6-Plus VisionTools Signal used: τ-bench 76.8%	τ-bench 76.8%	1m	$0.33	$1.95
19	Qwen3-Max VisionTools Signal used: τ-bench 76.8%	τ-bench 76.8%	262k	$0.78	$3.90
20	Gemini 3.1 Pro Preview PreviewVisionTools Signal used: τ-bench 76.5%	τ-bench 76.5%	1m	$2.00	$12.00

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

#4Command A+
Command A+ is Cohere's open-weight sparse mixture-of-experts model for enterprise agentic, multimodal, multilingual, RAG, and reasoning-heavy workloads. It combines text and image inputs, tool use, structured outputs, 48-language support, and hardware-efficient deployment that can run on one B200 or two H100 GPUs.
85%
τ-bench
#5Claude Opus 4.6
Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.
84.8%
τ-bench
#6GLM-5
Flagship open-weight foundation model from Zhipu AI with 744B parameters (40B active per token) in Mixture of Experts architecture. Trained on 28.5T tokens using DeepSeek Sparse Attention on Huawei Ascend hardware. Achieves state-of-the-art performance on coding and agentic benchmarks (SWE-bench Verified: 77.8%). Supports autonomous planning, multi-step tool use, and self-correction.
82.1%
τ-bench

Compare Top Picks

Side-by-side comparison of the top picks by price, benchmark, and API access.

Ring-2.6-1T vs Mistral Medium 3.5 Ring-2.6-1T vs ByteDance Doubao Seed 2.0 Pro Ring-2.6-1T vs Claude Mythos Preview Ring-2.6-1T vs LFM2.5 8B A1B Mistral Medium 3.5 vs ByteDance Doubao Seed 2.0 Pro Mistral Medium 3.5 vs Claude Mythos Preview

Browse Other Categories

Best LLMs for Code Generation Best LLMs for RAG Best AI Agent Models 2026: SWE-bench Ranked Best LLMs for Classification Best Open Source LLMs Best Multimodal / Vision LLMs Best LLM for Translation in 2026 Best AI Image Models in 2026 Best AI Video Models in 2026 Best LLMs for Reasoning & Math Best Small Language Models (SLMs)Cheapest LLM APIs You Can Call Right Now Best Long Context LLMs Best Mainstream LLM APIs, Ranked Best LLMs for Enterprise Best Free LLMs You Can Use Right Now Best LLMs for Writing Best LLMs for Marketing Best LLMs for Customer Support

Frequently asked questions

Which LLM is best for function calling and tool use?

Ring-2.6-1T is the current LLMReference top pick for function calling and tool use. The verdict uses the stored category signal τ-bench: 95.32%. Output pricing starts at $0.63 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.

How does Ring-2.6-1T compare to Mistral Medium 3.5 for function calling and tool use?

Ring-2.6-1T leads Mistral Medium 3.5 in the visible shortlist on τ-bench: 95.32% versus 91.4%. The pricing cards show Ring-2.6-1T: output pricing starts at $0.63 per 1m tokens and Mistral Medium 3.5: output pricing starts at $7.50 per 1m tokens.

How does LLMReference rank LLMs for function calling and tool use?

LLMReference ranks LLMs for function calling and tool use from stored model, benchmark, freshness, and pricing data. The current methodology summary is: Tool-use leaders rank on BFCL first; when Berkeley coverage lags new SKUs, we fall back to τ-bench so fresh agentic models still surface.

How often is this list updated?

The LLM rankings on this page are updated daily as new benchmark scores, provider availability, and pricing data are tracked. The "as of" date at the top of the page shows the most recent refresh.

How do you decide which models appear in the top 3?

The podium picks are driven by the primary benchmark signal for this category (shown in the Methodology section), filtered to non-deprecated models with confirmed API availability. In ties, we prefer the more recently released model.

Are preview or beta models included?

Preview models appear in the "Watch list" section but are not in the main ranked podium unless the category explicitly allows it (e.g., /best/coding and /best/agents, where preview models often lead benchmarks).

Can I compare two specific models head-to-head?

Yes — use the Compare tool at llmreference.com/compare for a side-by-side breakdown of context window, pricing, benchmarks, and provider availability.

Is the pricing data real-time?

Pricing is tracked from provider documentation and updated regularly. It reflects the best available public data, not live API quotes — always verify before billing.