Nemotron 3 Super-120B-A12B
- RULER
- 96.33%
- Output (from)
- $0.450 / 1M
Last refreshed 2026-06-27. Next refresh: weekly.
Compare models for RAG, document QA, retrieval-heavy assistants, and long-context grounding by context window, document benchmarks, tool support, and pricing.
Verdict
Llama 4 Scout 17B-16E Instruct is the runner-up; compare RULER against Context.
RAG picks emphasize the strongest sourced long-document benchmark among tracked RAG needles, QA, and retrieval suites, then context window, then recency.
| # | Model | Input $/1M | Output $/1M | |
|---|---|---|---|---|
| 1 | Nemotron 3 Super-120B-A12B Signal used: RULER 96.33% | $0.09 | $0.45 | |
| 2 | Llama 4 Scout 17B-16E Instruct Vision Signal used: Context 10m | $0.08 | $0.22 | |
| 3 | Grok 4.20 Multi-Agent ReasoningVisionTools Signal used: Context 2m | $1.25 | $2.50 | |
| 4 | Gemini 1.5 Pro Signal used: Context 2m | $1.25 | $5.00 | |
| 5 | GPT-5.5 ReasoningVisionTools Signal used: Context 1.05m | $5.00 | $30.00 | |
| 6 | GPT-5.5 Pro ReasoningVisionTools Signal used: Context 1.05m | $30.00 | $180.00 | |
| 7 | GPT-5.4 ReasoningVisionTools Signal used: Context 1.05m | $2.50 | $15.00 | |
| 8 | GPT-5.4 Pro ReasoningVisionTools Signal used: Context 1.05m | $30.00 | $180.00 | |
| 9 | Gemini 3.5 Flash ReasoningVisionTools Signal used: Context 1.05m | $1.50 | $9.00 | |
| 10 | Antigravity Agent PreviewReasoningVision Signal used: Context 1.05m | — | — | |
| 11 | Gemini 3.1 Flash-Lite VisionTools Signal used: Context 1.05m | $0.25 | $1.50 | |
| 12 | Xiaomi MiMo-V2.5-Pro Tools Signal used: Context 1.05m | $0.43 | $0.87 | |
| 13 | Xiaomi MiMo-V2.5 ReasoningVisionTools Signal used: Context 1.05m | $0.14 | $0.28 | |
| 14 | Gemini 2.5 Pro Computer Use Preview PreviewVisionTools Signal used: Context 1.05m | $1.25 | $10.00 | |
| 15 | GPT-4.1 VisionTools Signal used: Context 1.05m | $2.00 | $8.00 | |
| 16 | GPT-4.1 Mini VisionTools Signal used: Context 1.05m | $0.40 | $1.60 | |
| 17 | Fugu ReasoningVisionTools Signal used: Context 1m | — | — | |
| 18 | Fugu Ultra ReasoningVisionTools Signal used: Context 1m | $5.00 | $30.00 | |
| 19 | GLM-5.2 ReasoningTools Signal used: Context 1m | $1.40 | $4.40 | |
| 20 | Claude Fable 5 ReasoningVisionTools Signal used: Context 1m | $10.00 | $50.00 |
Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.
Gemini 1.5 Pro, created by Google DeepMind, is a state-of-the-art multimodal large language model that significantly advances over its predecessors in processing and analyzing large datasets across various formats like text, images, audio, and video. It features a highly extended context window of up to 2 million tokens, allowing it to maintain coherence over lengthy interactions. With over 200 billion parameters, the model excels in tasks requiring nuanced language processing, coding assistance, and advanced reasoning. Integrated into Google's platforms such as Vertex AI, Gemini 1.5 Pro also emphasizes ethical considerations, ensuring safety and appropriateness in AI deployment.
2m
Context
GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.
1.05m
Context
GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.
1.05m
Context
Side-by-side comparison of the top picks by price, benchmark, and API access.