EDITOR'S CHOICEResearched 1d ago
Claude Sonnet 4.6
Anthropic · 1M context
Excellent
The most reliable tool-use loop in production — recovers from errors on its own.
Best generally-available τ-bench (87.5); stays on-task across long tool loops and self-corrects without prompting.
The numbers
$/1M out
$15.00
$3.00 input
Context
1M
max window
Pros
- +Top GA τ-bench score
- +Reliable multi-step recovery
- +1M context
Cons
- −$15 / 1M out
- −Not the cheapest per step