LLM Reference
ARC-AGI-2 HighactiveReasoningVision

ARC-AGI-2 High: ARC-AGI-2 — High Effort

Metric: Accuracy (higher is better)Introduced: 2026

High-effort ARC-AGI-2 variant for abstract visual pattern reasoning. Kept separate from general ARC-AGI-2 rows that may use different effort settings. High benchmark score alone doesn't make a model the right pick — weigh it against pricing, API availability, and release date.

Models ranked

2

tracked on this benchmark

Score band

83.3 – 72.1

best → lowest tracked

Snapshot trend

-11.20

Apr 24 → May 28 · 1 models

Leaderboard

Tracked models ranked by Accuracy (higher is better).

Compare candidates
#ModelScore

How to read this benchmark

This benchmark scores models where higher is better. Scores are useful for directional filtering and shortlisting — not for universal quality ranking. Prefer benchmarks closest to your workload, then validate the linked model pages for pricing, context window, and provider availability.

Trust this score when

  • There is a fresh timestamped snapshot (or multiple snapshots) for this benchmark.
  • The model list covers the same version family you can actually deploy today.
  • Top candidates overlap with your required routing and feature requirements.

Be cautious when

  • There is only one benchmark snapshot or the dataset appears stale.
  • The benchmark metric direction is opposite of your decision objective.
  • The score difference between options is narrow and likely within implementation variance.

FAQ

What does the ARC-AGI-2 High benchmark measure?

High-effort ARC-AGI-2 variant for abstract visual pattern reasoning. Kept separate from general ARC-AGI-2 rows that may use different effort settings. On this page it ranks 2 tracked models where higher is better.

Is a higher ARC-AGI-2 — High Effort score always better?

For this benchmark, higher is better. A high score helps you shortlist, but confirm pricing, context window, and provider availability on each model page before committing — the top scorer is not always the right pick for your workload or budget.

How current is this ARC-AGI-2 — High Effort data?

This benchmark was last reviewed on Jun 17, 2026. The tracked score average moved -11.20 points across the last 2 snapshots.

Related benchmarks

Last reviewed: Jun 17, 2026

Resources