LLM Reference

Best LLMs for Retrieval-Augmented Generation (2026)

Last refreshed 2026-06-27. Next refresh: weekly.

Compare models for RAG, document QA, retrieval-heavy assistants, and long-context grounding by context window, document benchmarks, tool support, and pricing.

Verdict

Use Nemotron 3 Super-120B-A12B for RAG today.

Llama 4 Scout 17B-16E Instruct is the runner-up; compare RULER against Context.

Researched 26d agoWhy this pickMethodology

How we rank

RAG picks emphasize the strongest sourced long-document benchmark among tracked RAG needles, QA, and retrieval suites, then context window, then recency.

  1. EligibilityModels tagged for the RAG decision task (collections/specialization fit or scores on RULER, ZeroSCROLLs, InfiniteBench, multi-needle, MS MARCO, SQuAD, NaturalQuestions, TriviaQA, etc.).
  2. Primary rankingBest score across the RAG benchmark bundle, then larger declared context window, then newer release.
  3. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  4. PricingLowest tracked provider input/output where present.
#ModelInput $/1MOutput $/1M
1Nemotron 3 Super-120B-A12B

Signal used: RULER 96.33%

$0.09$0.45
2Llama 4 Scout 17B-16E Instruct
Vision

Signal used: Context 10m

$0.08$0.22
3Grok 4.20 Multi-Agent
ReasoningVisionTools

Signal used: Context 2m

$1.25$2.50
4Gemini 1.5 Pro

Signal used: Context 2m

$1.25$5.00
5GPT-5.5
ReasoningVisionTools

Signal used: Context 1.05m

$5.00$30.00
6GPT-5.5 Pro
ReasoningVisionTools

Signal used: Context 1.05m

$30.00$180.00
7GPT-5.4
ReasoningVisionTools

Signal used: Context 1.05m

$2.50$15.00
8GPT-5.4 Pro
ReasoningVisionTools

Signal used: Context 1.05m

$30.00$180.00
9Gemini 3.5 Flash
ReasoningVisionTools

Signal used: Context 1.05m

$1.50$9.00
10Antigravity Agent
PreviewReasoningVision

Signal used: Context 1.05m

11Gemini 3.1 Flash-Lite
VisionTools

Signal used: Context 1.05m

$0.25$1.50
12Xiaomi MiMo-V2.5-Pro
Tools

Signal used: Context 1.05m

$0.43$0.87
13Xiaomi MiMo-V2.5
ReasoningVisionTools

Signal used: Context 1.05m

$0.14$0.28
14Gemini 2.5 Pro Computer Use Preview
PreviewVisionTools

Signal used: Context 1.05m

$1.25$10.00
15GPT-4.1
VisionTools

Signal used: Context 1.05m

$2.00$8.00
16GPT-4.1 Mini
VisionTools

Signal used: Context 1.05m

$0.40$1.60
17Fugu
ReasoningVisionTools

Signal used: Context 1m

18Fugu Ultra
ReasoningVisionTools

Signal used: Context 1m

$5.00$30.00
19GLM-5.2
ReasoningTools

Signal used: Context 1m

$1.40$4.40
20Claude Fable 5
ReasoningVisionTools

Signal used: Context 1m

$10.00$50.00

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • #4Gemini 1.5 Pro

    Gemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.

    2m

    Context

  • GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.

    1.05m

    Context

  • GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.

    1.05m

    Context