LLM Reference

Best LLMs for Customer Support (2026)

Last refreshed 2026-06-01. Next refresh: weekly.

Function-calling models for support bots, ranked by tau-bench service-task performance with BFCL fallback and a $25 per 1k conversation cost gate.

The list excludes models above $25.00 per 1k support conversations, using the cheapest public provider route and a 4k-input / 1k-output average turn over five turns.

Verdict

Use GLM-5 for support automation today.

Kimi K2.5 is the runner-up, 8 points back on τ-bench.

Researched 46d agoWhy this pickMethodology

How we rank

Support bots prioritize τ-bench multiturn service scores, with BFCL fallback only when τ-bench is unavailable, after a cost gate removes high-throughput options.

  1. EligibilityModels with `function_calling` enabled, public list token pricing, and a cheapest public provider route under the support workload cost gate.
  2. Primary rankingτ-bench is the primary score; when retail and airline splits are both present, we average them. If τ-bench is unavailable, the row falls back to BFCL.
  3. Cost gateRows above $25.00 per 1k conversations are excluded using the cheapest public provider route. The category-local workload is 4k input + 1k output tokens per turn across five turns; it does not change the global cost calculator.
  4. Tie-breaksWhen primary scores match, lower blended support cost wins, then confirmed `function_calling = 1`, then newer `release`.
  5. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell. This page also requires a public token-priced route that can be evaluated by the support cost gate.
  6. PricingSupport workloads are throughput-sensitive — compare batch/cache columns on provider pages.
#ModelInput $/1MOutput $/1M
1GLM-5
ReasoningTools

Signal used: τ-bench 82.1%

$0.60$2.08
2Kimi K2.5
Tools

Signal used: τ-bench 74.2%

$0.44$2.00
3Qwen3.5-397B-A17B
ReasoningTools

Signal used: BFCL 72.9%

$0.39$2.34
4Gemini 3 Flash
PreviewVisionTools

Signal used: τ-bench 71.5%

$0.50$3.00
5Gemini 2.5 Flash
VisionTools

Signal used: BFCL 56.24%

$0.30$2.50
6GPT-5 Mini
ReasoningVisionTools

Signal used: BFCL 55.46%

$0.25$2.00
7GPT-4.1 Mini
VisionTools

Signal used: BFCL 50.45%

$0.40$1.60
8Mistral Large 2
VisionTools

Signal used: BFCL 38.37%

$0.48$1.50
9CoBuddy
ReasoningTools

Signal used: Release 2026-05-06

FreeFree
10Gemma 4 E2B
Tools

Signal used: Release 2026-03-31

FreeFree
11Gemma 4 E4B
Tools

Signal used: Release 2026-03-31

FreeFree
12ShieldGemma 2
VisionTools

Signal used: Release 2024-09-01

FreeFree
13MedSigLIP
VisionTools

Signal used: Release 2024-07-01

FreeFree
14TxGemma
Tools

Signal used: Release 2024-06-01

FreeFree
15PaliGemma
VisionTools

Signal used: Release 2024-03-01

FreeFree
16LFM2-24B-A2B
Tools

Signal used: Release 2025-11-01

$0.03$0.12
17gpt-oss-20b
Tools

Signal used: Release 2025-08-05

$0.03$0.14
18AutoGLM Phone 9B Multilingual
VisionTools

Signal used: Release 2025-12-08

$0.04$0.14
19Trinity Mini
Tools

Signal used: Release 2025-12-01

$0.04$0.15
20gpt-oss-120b
Tools

Signal used: Release 2025-08-05

$0.04$0.18

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • Gemini 3 Flash is Google's speed-optimized Gemini 3 model, available in public preview via the Gemini API and Vertex AI. It supports text, image, audio, and video inputs with a 1M token context window and is priced at $0.50 per 1M input tokens and $3.00 per 1M output tokens.

    71.5%

    τ-bench

  • #5Gemini 2.5 Flash

    Google: Gemini 2.5 Flash available via OpenRouter. Pricing: $0.3/1M input, $2.5/1M output.

    56.24%

    BFCL

  • Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).

    55.46%

    BFCL