LLM ReferenceLLM Reference
RULERactiveLong context

RULER

Metric: RULER Score (higher is better)Introduced: 2024

About

Flexible benchmark testing effective context utilization across 13 diverse long-context tasks including retrieval, multi-hop reasoning, and aggregation at configurable lengths.

Resources

Website