o3

Name: o3
Author: OpenAI

Released

2025-04-16

Last refreshed

2026-06-29

Status

Researched 40d ago

ProprietaryCommercial use: conditionalMultimodalCodingRAGAgentsLong contextVisionJSON / Tool use

o3 is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

Teams evaluating coding, rag, and agents
Workloads that can use a 200k context window
Buyers comparing 3 tracked provider routes

Do not use it for

Workloads where another current model has stronger sourced task evidence

Specifications

Family: o3
Released: 2025-04-16
Context: 200k
Max output: 100,000
Architecture: Decoder Only
Knowledge cutoff: 2024-06
Specialization: general
Openness: Proprietary
License: ProprietaryCommercial use: conditional
Weights: Not released
Code: Unknown
Training: Fine-tuned

Created by

OpenAI

Cutting-edge research and development.

San Francisco, California, United States

Founded 2015

Website

Pricing

Output / 1M

$8.00

Input / 1M

$2.00

Cheapest of 3 routes · OpenAI API · cache read $0.500

Providers(3)

OpenAI API OpenRouter Vercel AI Gateway

View 3 provider routes

About

OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.

o3 is a proprietary model. The structured metadata tracks a 200k-token context window, multimodal input, reasoning, function calling, tool use, structured outputs, and code execution. This page tracks provider routes through OpenAI API, OpenRouter, and Vercel AI Gateway, with the cheapest tracked route listed at $2 input and $8 output per 1M tokens. Headline tracked benchmarks include HumanEval 96.7, SWE-bench Verified 71.7, and LiveCodeBench 85.5.

Top use-case fit: coding, agents, and build tasks

Coding

Q/$ D

4 relevant benchmarks in the decision map.

RAG

Included by capability and metadata signals in the decision map.

Agents

Q/$ C

1 relevant benchmark in the decision map.

Provider price ladder

Compare all 3

Compare API pricing across 3 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Batch in / out	Cache	Route
OpenAI API	$2.00	$8.00	$1.00 / $4.00	read $0.500	Serverless
OpenRouter	$2.00	$8.00	-	-	Serverless
Vercel AI Gateway	$2.00	$8.00	-	read $0.500	Serverless

Available via routers & gateways(15)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSOpenAI API

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughOpenAI API

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionOpenAI API

AIRouter

Router

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Passthrough + feeOpenAI API

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionOpenAI API

Kong AI Gateway

Gateway

Multi-LLM AI gateway built on Kong Gateway 3.x, adding semantic routing, load balancing, guardrails, and MCP traffic analytics as plugins over Kong's existing API management platform.

SubscriptionOpenAI API

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch API

Benchmark peer barsfor Coding

SWE-bench VerifiedRank 57 of 80

Claude Fable 5

96.0

Claude Mythos Preview

93.9

Claude Opus 4.8

88.6

Claude Opus 4.7

87.6

o3current

71.7

HumanEvalRank 2 of 97

98.0

96.7

95.0

94.5

94.2

LiveCodeBenchRank 16 of 55

DeepSeek V4 Pro

93.5

Gemini 3.1 Pro Preview

91.7

DeepSeek V4 Flash

91.6

Qwen3.7-Max

91.6

o3current

85.5

Benchmark scores(12)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Evaluation	Source
HumanEval	96.7	2025-04Observed 2026-04-09	—	Source
SWE-bench Verified	71.7	custom agent scaffold, high reasoningObserved 2025-04-16	—	Source
LiveCodeBench	85.5	v6 pass@1, independent Kaggle leaderboardObserved 2025-11-13	—	Source
Aider Polyglot	81.3	2026-04 (high)Observed 2026-04-14	—	Source
Chatbot Arena	1412.0	—Observed 2026-04-15	—	Source
Google-Proof Q&A	87.7	GPQA Diamond, high reasoningObserved 2025-04-16	—	Source
Massive Multi-discipline Multimodal Understanding	82.9	standard MMMUObserved 2025-04-16	—	Source
MathVista	86.8	—Observed 2026-06-03	—	Source
MMMU Pro	76.4	LLM-Stats aggregator (official no-tools or equivalent)Observed 2026-06-07	—	Source
AIME 2024	96.7	—Observed 2025-04-16	—	Source
AIME 2025	88.9	AIME 2025 pass@1, high reasoningObserved 2025-04-16	—	Source
Humanity's Last Exam	20.3	Humanity's Last Exam, no tools, high reasoningObserved 2025-04-16	—	Source

Migration checks

No linked migration route is available for this model yet.

Show all 53 popular comparisonssorted by 7-day search impressions

o3 vs Kimi K2.6219 o3 vs Claude Sonnet 4.5166 o3 vs DeepSeek V4 Pro148 o3 vs Ling-2.6-1T126 o3 vs Llama 3 8B Instruct116 o3 vs Xiaomi MiMo-V2.582 o3 vs Grok-379 o3 vs Claude 3.7 Sonnet79 o3 vs Step 3.5 Flash67 o3 vs Gemini 2.5 Pro Preview 05-0664 o3 vs DeepSeek V3.161 o3 vs Mistral Large 260 o3 vs Kimi K2 Thinking58 o3 vs DeepSeek V355 o3 vs GLM-5.151 o3 vs DeepSeek V3.251 o3 vs GPT-5.242 o3 vs Kimi K2.538 o3 vs Phi-3 Mini 4k37 o3 vs Claude Opus 4.533 o3 vs GLM-532 o3 vs DeepSeek R1 052830 o3 vs Trinity-Large-Thinking27 o3 vs Qwen3-Max26 o3 vs o3 Mini21 o3 vs Qwen3.6-27B20 o3 vs Mistral Large 2 (2407)20 o3 vs Qwen3.6-35B-A3B18 o3 vs Llama 3.1 70B Instruct17 o3 vs DeepSeek V3 Base14 o3 vs Llama 3.2 1B Instruct14 o3 vs Qwen3.5-397B-A17B13 o3 vs Llama 3 70B Instruct13 o3 vs GLM-5V-Turbo12 o3 vs Gemini 2.5 Flash11 o3 vs GLM-5 9B11 o3 vs Qwen2.5-7B-Instruct10 o3 vs GLM-5 Turbo10 o3 vs Qwen2.5-72B-Instruct9 o3 vs Mixtral 8x22B Instruct v0.39 o3 vs Ling-2.6-Flash8 o3 vs Gemma 7B Instruct7 o3 vs Mixtral 8x7B5 o3 vs Qwen3.5-27B4 o3 vs DeepSeek R1 Distill Llama 70B4 o3 vs Qwen2-7B-Instruct3 o3 vs Qwen3.5-122B-A10B3 o3 vs Phi-4 Mini Flash Reasoning2 o3 vs Llama 2 13B Chat1 o3 vs Mixtral 8x7B Instruct v0.11 o3 vs Qwen3.5-35B-A3B1 o3 vs Gemini 3.1 Pro Preview Custom Tools1 o3 vs Qwen3-235B-A22B0

Frequently asked questions

What is the context window of o3?

o3 has a context window of 200k tokens.

What is the max output of o3?

o3 can generate up to 100,000 output tokens.

How much does o3 cost?

o3 pricing ranges from $2.00/1M to $2/1M input tokens depending on the provider.

When was o3 released?

o3 was released on 2025-04-16.

Which providers offer o3?

o3 is available from 3 providers: OpenAI API, OpenRouter, Vercel AI Gateway.

What benchmarks has o3 been tested on?

o3 has been evaluated on 12 benchmarks, including HumanEval, SWE-bench Verified, LiveCodeBench, Aider Polyglot, Chatbot Arena.

Created by

OpenAI

Cutting-edge research and development.

San Francisco, California, United States

Founded 2015

Website

Pricing

Output / 1M

$8.00

Input / 1M

$2.00

Cheapest of 3 routes · OpenAI API · cache read $0.500

Providers(3)

OpenAI API OpenRouter Vercel AI Gateway

View 3 provider routes

o3

Use it for

Do not use it for

About

Top use-case fit: coding, agents, and build tasks

Coding

RAG

Agents

Provider price ladder

Available via routers & gateways(15)

LiteLLM

OpenRouter

Portkey

AIRouter

Helicone

Kong AI Gateway

Capabilities

Benchmark peer barsfor Coding

Benchmark scores(12)

Migration checks

Rankings & picks(4)

Compare o3 with other models

Comparison and alternatives

Frequently asked questions