Which LLM is best for RAG?

Nemotron 3 Super-120B-A12B is the current LLMReference top pick for RAG. The verdict uses the stored category signal RULER: 96.33%. Output pricing starts at $0.45 per 1M tokens. Review the linked model and provider pages before production use because availability and pricing can change.

How does Nemotron 3 Super-120B-A12B compare to Llama 4 Scout 17B-16E Instruct for RAG?

Nemotron 3 Super-120B-A12B is the top visible pick with RULER: 96.33%; Llama 4 Scout 17B-16E Instruct is the runner-up with Context: 10m. The pricing cards show Nemotron 3 Super-120B-A12B: output pricing starts at $0.45 per 1m tokens and Llama 4 Scout 17B-16E Instruct: output pricing starts at $0.22 per 1m tokens.

How does LLMReference rank LLMs for RAG?

LLMReference ranks LLMs for RAG from stored model, benchmark, freshness, and pricing data. The current methodology summary is: RAG picks emphasize the strongest sourced long-document benchmark among tracked RAG needles, QA, and retrieval suites, then context window, then recency.

Best LLMs for Retrieval-Augmented Generation (2026)

Last refreshed 2026-06-27. Next refresh: weekly.

Compare models for RAG, document QA, retrieval-heavy assistants, and long-context grounding by context window, document benchmarks, tool support, and pricing.

Verdict

Use Nemotron 3 Super-120B-A12B for RAG today.

Llama 4 Scout 17B-16E Instruct is the runner-up; compare RULER against Context.

Researched 26d agoWhy this pick Methodology

1stTop pick

Researched 26d ago

Nemotron 3 Super-120B-A12B

RULER: 96.33%
Output (from): $0.450 / 1M

Try on provider Model detail Compare

2ndShortlist

Researched 20d ago

Llama 4 Scout 17B-16E Instruct

Context: 10m
Output (from): $0.220 / 1M

Try on provider Model detail Compare

3rdShortlist

Researched 38d ago

Grok 4.20 Multi-Agent

Context: 2m
Output (from): $2.50 / 1M

Try on provider Model detail Compare

How we rank

RAG picks emphasize the strongest sourced long-document benchmark among tracked RAG needles, QA, and retrieval suites, then context window, then recency.

Eligibility — Models tagged for the RAG decision task (collections/specialization fit or scores on RULER, ZeroSCROLLs, InfiniteBench, multi-needle, MS MARCO, SQuAD, NaturalQuestions, TriviaQA, etc.).
Primary ranking — Best score across the RAG benchmark bundle, then larger declared context window, then newer release.
Variant collapse — We keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
Pricing — Lowest tracked provider input/output where present.

RULER

#	Model	Signal used	Context	Input $/1M	Output $/1M
1	Nemotron 3 Super-120B-A12B Signal used: RULER 96.33%	RULER 96.33%	1.05m	$0.09	$0.45
2	Llama 4 Scout 17B-16E Instruct Vision Signal used: Context 10m	Context 10m	10m	$0.08	$0.22
3	Grok 4.20 Multi-Agent ReasoningVisionTools Signal used: Context 2m	Context 2m	2m	$1.25	$2.50
4	Gemini 1.5 Pro Signal used: Context 2m	Context 2m	2m	$1.25	$5.00
5	GPT-5.5 ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$5.00	$30.00
6	GPT-5.5 Pro ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$30.00	$180.00
7	GPT-5.4 ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$2.50	$15.00
8	GPT-5.4 Pro ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$30.00	$180.00
9	Gemini 3.5 Flash ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$1.50	$9.00
10	Antigravity Agent PreviewReasoningVision Signal used: Context 1.05m	Context 1.05m	1.05m	—	—
11	Gemini 3.1 Flash-Lite VisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$0.25	$1.50
12	Xiaomi MiMo-V2.5-Pro Tools Signal used: Context 1.05m	Context 1.05m	1.05m	$0.43	$0.87
13	Xiaomi MiMo-V2.5 ReasoningVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$0.14	$0.28
14	Gemini 2.5 Pro Computer Use Preview PreviewVisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$1.25	$10.00
15	GPT-4.1 VisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$2.00	$8.00
16	GPT-4.1 Mini VisionTools Signal used: Context 1.05m	Context 1.05m	1.05m	$0.40	$1.60
17	Fugu ReasoningVisionTools Signal used: Context 1m	Context 1m	1m	—	—
18	Fugu Ultra ReasoningVisionTools Signal used: Context 1m	Context 1m	1m	$5.00	$30.00
19	GLM-5.2 ReasoningTools Signal used: Context 1m	Context 1m	1m	$1.40	$4.40
20	Claude Fable 5 ReasoningVisionTools Signal used: Context 1m	Context 1m	1m	$10.00	$50.00

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

#4Gemini 1.5 Pro
Gemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.
2m
Context
#5GPT-5.5
GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.
1.05m
Context
#6GPT-5.5 Pro
GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.
1.05m
Context

Compare Top Picks

Side-by-side comparison of the top picks by price, benchmark, and API access.

Nemotron 3 Super-120B-A12B vs Llama 4 Scout 17B-16E Instruct Nemotron 3 Super-120B-A12B vs Grok 4.20 Multi-Agent Nemotron 3 Super-120B-A12B vs Gemini 1.5 Pro Nemotron 3 Super-120B-A12B vs GPT-5.5 Llama 4 Scout 17B-16E Instruct vs Grok 4.20 Multi-Agent Llama 4 Scout 17B-16E Instruct vs Gemini 1.5 Pro

Browse Other Categories

Best LLMs for Code Generation Best AI Agents & Agentic Models Best LLMs for Classification Best Open Source LLMs Best Multimodal / Vision LLMs Best LLMs for Reasoning & Math Best Small Language Models (SLMs)Best LLMs for Function Calling & Tool Use Cheapest LLM APIs You Can Call Right Now Best Long Context LLMs Best Mainstream LLM APIs, Ranked Best LLMs for Enterprise Best Free LLMs You Can Use Right Now Best LLMs for Writing Best LLMs for Marketing Best LLMs for Customer Support