LLM ReferenceLLM Reference
Concepts & capability filters
Capability filtercapabilitybeginner

Batch API

Also known as: batch processing, batch inference, asynchronous inference

offline batch pricing

37

matching active models

15

tracked providers

37

models with routes

model.batch_apimodelProvider.batch_token_inmodelProvider.batch_token_out

Definition

A batch API lets teams submit many model requests for asynchronous processing, often with different pricing from the standard realtime route. It is useful for offline evaluation, backfills, enrichment jobs, and other workloads where immediate latency is not required.

Models With Batch API

Sorted by decision relevance, with model flags and provider-route evidence from seed data.

37 matches
GPT-3.5 Turbo (Instruct)

GPT-3.5 Turbo Instruct by OpenAI is designed to excel in precise instruction following and task completion, focusing on accuracy and clarity over conversational abilities. It offers key enhancements like efficient instruction adherence, reduced hallucination, and lower toxicity compared to previous models. Compatible with legacy completion endpoints, it retains the speed and affordability of the standard GPT-3.5 Turbo model while using a 4K context window and training data up to September 2021. Not specifically built for chat, it still supports diverse tasks like question answering, text completion, and code generation, aiming to enhance AI usability with safer and more accurate interactions.

2023-09-19

Researched 5d ago

4K

4,000 tokens

JSONBatch
OpenAI API

$1.50 in / $2.00 out / 1M tokens

3 routes · 1 batch

Provider docs
Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.

2024-03-04

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions
AWS Bedrock

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch

Provider docs
o3

OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.

2025-03-31

Researched 5d ago

200K

200,000 tokens

200K contextReasoningJSONCode execPrompt cacheBatch
OpenAI API

$2.00 in / $8.00 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5

OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$1.25 in / $10.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
GPT-5 Mini

Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$0.250 in / $2.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
GPT-5 Nano

Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.

2025-08-07

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$0.050 in / $0.400 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
GPT-5.4 Pro

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 5d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-4.1

OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.

2025-04-01

Researched 5d ago

1M

1,047,576 tokens

1M contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.00 in / $8.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Provider docs
GPT-4.1 Mini

Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.

2025-04-01

Researched 5d ago

1M

1,047,576 tokens

1M contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$0.400 in / $1.60 out / 1M tokens

3 routes · 1 cache

Provider docs
Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.

2024-10-22

Researched 26d ago

200k

200,000 tokens

200k contextReasoningVisionJSONCode execBatch
Anthropic

$0.800 in / $4.00 out / 1M tokens

5 routes · 1 batch · 1 cache

Provider docs
Claude Sonnet 4.5

Claude Sonnet 4.5 available on AWS Bedrock

2025-09-29

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions
Anthropic

$3.00 in / $15.00 out / 1M tokens

7 routes · 1 batch

Provider docs
Claude Opus 4.5

Claude Opus 4.5 available on AWS Bedrock

2025-11-01

Researched 26d ago

200K

200,000 tokens

200K contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

5 routes · 1 batch

Provider docs
Mistral Large 2

Advanced reasoning and coding model, multilingual with native function calling and JSON output

2025-11-25

Researched 22d ago

128K

128,000 tokens

128K contextVisionMultimodalTool useFunctionsJSON
OpenRouter

$0.500 in / $1.50 out / 1M tokens

4 routes · 1 batch

Provider docs
GPT-4o

OpenAI GPT-4o: Flagship multimodal model with vision, function calling, and broad capability. $2.50/M input, $10/M output.

2024-05-13

Researched 5d ago

128K

128,000 tokens

128K contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$2.50 in / $10.00 out / 1M tokens

4 routes · 1 batch · 1 cache

Provider docs
Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 7d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$3.00 in / $15.00 out / 1M tokens

4 routes · 1 batch · 1 cache

Provider docs
Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched 1d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

5 routes · 1 batch · 1 cache

Provider docs
Claude Opus 4.6

Claude Opus 4.6 available on AWS Bedrock

2026-02-05

Researched 26d ago

1M

1,000,000 tokens

1M contextReasoningVisionMultimodalTool useFunctions
Anthropic

$5.00 in / $25.00 out / 1M tokens

4 routes · 1 batch · 1 cache

Provider docs
Mistral Small 3.1 24B Instruct

Mistral's Small 3.1 24B model with multimodal vision understanding capabilities. Optimized for cost-efficient deployment with 128K token context window. Available on Cloudflare Workers AI.

2025-12-15

Researched 22d ago

128K

128,000 tokens

128K contextVisionMultimodalJSONBatch
Together AI

$0.100 in / $0.300 out / 1M tokens

5 routes · 1 batch

Provider docs
GPT-4o-mini

OpenAI: GPT-4o-mini available via OpenRouter. Pricing: $0.15/1M input, $0.6/1M output.

2024-07-18

Researched 5d ago

128K

128,000 tokens

128K contextJSONPrompt cacheBatchFine-tune
OpenAI API

$0.150 in / $0.600 out / 1M tokens

3 routes · 1 cache

Provider docs
GPT-4o (2024-11-20)

OpenAI: GPT-4o (2024-11-20) available via OpenRouter. Pricing: $2.5/1M input, $10/1M output.

2024-11-20

Researched 5d ago

128K

128,000 tokens

128K contextJSONBatch
OpenRouter

$2.50 in / $10.00 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-5.4 Nano

GPT-5.4 Nano is the smallest and fastest variant in the GPT-5.4 family, optimized for edge deployment and low-latency tasks. Model ID: gpt-5.4-nano.

2026-03-05

Researched 5d ago

400K

400,000 tokens

400K contextMultimodalTool useFunctionsJSONCode exec
OpenAI API

$0.200 in / $1.25 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's premium variant of GPT-5.5, released April 23, 2026. Targets large quality gains for business, legal, education, and data science use cases. Scores 39.6% on FrontierMath Tier 4 (postdoctoral-level math problems), compared to 22.9% for Claude Opus 4.7. Priced at 6× the standard GPT-5.5 API rate. Available to ChatGPT subscribers and via API.

2026-04-23

Researched 5d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$30.00 in / $180.00 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-5.4 Mini

GPT-5.4 Mini is a smaller, cost-efficient variant of GPT-5.4 with a 400K token context window. Designed for tasks requiring long-context processing at lower cost. Model ID: gpt-5.4-mini.

2026-03-05

Researched 5d ago

400K

400,000 tokens

400K contextReasoningMultimodalTool useFunctionsJSON
OpenAI API

$0.750 in / $4.50 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5.5

GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimized for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 58.6% on SWE-Bench Pro. Individual factual claims are 23% more likely to be correct versus GPT-5.4, with factual errors 3% less frequent. Uses fewer tokens than GPT-5.4 for equivalent tasks. Supports text and image inputs. Available to ChatGPT Plus, Business, and Enterprise subscribers; API access coming soon. Model ID: gpt-5.5.

2026-04-23

Researched 5d ago

1.1M

1,050,000 tokens

1.1M contextReasoningVisionMultimodalTool useFunctions
OpenAI API

$5.00 in / $30.00 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5.4

GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

2026-03-05

Researched 5d ago

1.1M

1,050,000 tokens

1.1M contextReasoningMultimodalTool useFunctionsJSON
OpenRouter

$2.50 in / $15.00 out / 1M tokens

2 routes · 1 batch · 1 cache

Provider docs
GPT-5.3-Codex

Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).

2026-02-05

Researched 5d ago

400K

400,000 tokens

400K contextReasoningVisionTool useFunctionsJSON
OpenAI API

$1.75 in / $14.00 out / 1M tokens

2 routes · 1 cache

Provider docs
Claude Haiku 4.5

Claude Haiku 4.5 available on AWS Bedrock

2025-10-01

Researched 26d ago

200k

200,000 tokens

200k contextVisionMultimodalTool useFunctionsJSON
AWS Bedrock

$0.800 in / $4.00 out / 1M tokens

7 routes · 1 batch

Provider docs
Claude 3.5 Sonnet v2

Advanced reasoning and coding model with improved performance, supports cached context for cost savings

2025-12-01

Researched 26d ago

200k

200,000 tokens

200k contextJSONBatch
AWS Bedrock

$6.00 in / $30.00 out / 1M tokens

2 routes · 1 batch

Provider docs
Ministral 14B

Efficient lightweight model with strong reasoning and instruction-following capabilities

2025-06-01

Researched 26d ago

32k

32,000 tokens

JSONBatch
AWS Bedrock

$0.240 in / $0.240 out / 1M tokens

2 routes · 1 batch

Provider docs
Ministral 8B

Compact efficient model optimized for cost-sensitive deployments

2025-06-01

Researched 26d ago

32k

32,000 tokens

JSONBatch
AWS Bedrock

$0.100 in / $0.100 out / 1M tokens

2 routes · 1 batch

Provider docs
GPT-5.5 Instant

GPT-5.5 Instant is OpenAI's latest Instant model used in ChatGPT, released May 5, 2026 as the new default ChatGPT model and exposed in the API as chat-latest. OpenAI says the update improves factuality, image analysis, STEM answers, web-search decisions, personalization from past chats/files/connected Gmail, and concise conversational style. OpenAI reports 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts and 37.3% fewer inaccurate claims on difficult conversations flagged for factual errors.

2026-05-05

Researched 5d ago

400K

400,000 tokens

400K contextVisionMultimodalTool useFunctionsJSON
OpenAI API

$1.50 in / $6.00 out / 1M tokens

1 route

Provider docs
Llama 4 Maverick 17B

Multimodal Llama 4 with 128 experts, optimized for fast responses with minimal computational cost

2025-10-01

Researched 26d ago

128k

128,000 tokens

128k contextMultimodalJSONCode execBatch
AWS Bedrock

$0.240 in / $0.970 out / 1M tokens

1 route · 1 batch

Provider docs
Llama 4 Scout 17B

Multimodal Llama 4 with 16 active experts, supports 10M token context window for long-document processing

2025-10-01

Researched 26d ago

10M

10,000,000 tokens

10M contextMultimodalJSONBatch
AWS Bedrock

$0.170 in / $0.660 out / 1M tokens

1 route · 1 batch

Provider docs