RULERactiveLong context
RULER
Metric: RULER Score (higher is better)Introduced: 2024
About
Flexible benchmark testing effective context utilization across 13 diverse long-context tasks including retrieval, multi-hop reasoning, and aggregation at configurable lengths.