Question 1

What does the ClawEval-1.1 benchmark measure?

Accepted Answer

Agentic workflow benchmark covering tool-use integrity, task completion, and adversarial resistance. StepFun reported Step 3.7 Flash at 67.1 on the 1.1 release table. On this page it ranks 1 tracked model where higher is better.

Question 2

Is a higher ClawEval-1.1 score always better?

Accepted Answer

For this benchmark, higher is better. A high score helps you shortlist, but confirm pricing, context window, and provider availability on each model page before committing — the top scorer is not always the right pick for your workload or budget.

Question 3

How current is this ClawEval-1.1 data?

Accepted Answer

This benchmark was last reviewed on May 29, 2026. Re-check the linked model pages for the freshest provider and pricing detail.

ClawEval-1.1

Leaderboard

How to read this benchmark

FAQ

Related benchmarks

Resources