Concepts & capability filters

Capability filtercapabilitybeginner

Batch API

Also known as: batch processing, batch inference, asynchronous inference

offline batch pricing

See matching models with benchmark scores and pricing.

48

matching active models

20

tracked providers

47

models with routes

model.batch_apimodelProvider.batch_token_inmodelProvider.batch_token_out

Definition

A batch API lets teams submit many model requests for asynchronous processing, often with different pricing from the standard realtime route. It is useful for offline evaluation, backfills, enrichment jobs, and other workloads where immediate latency is not required.

Models With Batch API

Sorted by decision relevance, with tracked capability and provider-route evidence.

48 matches

ModelReleaseContextCapabilitiesProvider route

Mistral Medium 3 Instruct

Mistral Medium 3 Instruct is MistralAI's Mistral Medium model. It offers a 128K-token context window.

2025-10-01

Researched 23d ago

128k

128,000 tokens

128k contextVisionMultimodalBatch

Mistral AI Studio

$0.400 in / $2.00 out / 1M tokens

2 routes · 1 batch

Mistral Large 3 675B Instruct

Mistral Large 3 675B Instruct is MistralAI's Mistral Large model. It offers a 128K-token context window and scores 70.2 on τ-bench.

2025-12-01

Researched 8d ago

128k

128,000 tokens

128k contextVisionMultimodalJSONBatchPrompt cache

$0.500 in / $1.50 out / 1M tokens

6 routes · 1 batch · 1 cache

Amazon Nova Premier

Amazon Nova Premier is Amazon's most capable standard Bedrock Nova understanding model for complex reasoning, agentic workflows, and model distillation. It supports a 1M-token context window, text/image/video inputs, text output, reasoning, tool calling, and prompt caching; use it as the standard Bedrock Nova frontier pick instead of Nova 2 Omni early-access Forge checkpoints.

2025-03-17

Researched 3d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$2.50 in / $12.50 out / 1M tokens

2 routes · 1 batch · 1 cache

Mistral Large 2.1 (2411)

Mistral Large 2.1 is Mistral AI's November 2024 flagship Large 2 snapshot for high-complexity reasoning, coding, and function-calling workloads, with a 128K context window.

2024-11-18

Researched 33d ago

128k

128,000 tokens

128k contextTool useFunctionsJSONBatch

No tracked provider route

GPT-3.5 Turbo (Instruct)

GPT-3.5 Turbo Instruct by OpenAI is designed to excel in precise instruction following and task completion, focusing on accuracy and clarity over conversational abilities. It offers key enhancements like efficient instruction adherence, reduced hallucination, and lower toxicity compared to previous models. Compatible with legacy completion endpoints, it retains the speed and affordability of the standard GPT-3.5 Turbo model while using a 4K context window and training data up to September 2021. Not specifically built for chat, it still supports diverse tasks like question answering, text completion, and code generation, aiming to enhance AI usability with safer and more accurate interactions.

2023-09-19

Researched 26d ago

4k

4,000 tokens

JSONBatch

$1.50 in / $2.00 out / 1M tokens

4 routes · 1 batch

Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's advanced model with extended thinking capabilities, offering state-of-the-art reasoning for complex tasks.

2024-03-04

Researched 69d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch

OpenAI o3 reasoning model with advanced multi-step problem-solving capabilities.

2025-04-16

Researched 19d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions

$2.00 in / $8.00 out / 1M tokens

3 routes · 1 batch · 2 cache

OpenAI's previous intelligent reasoning model with configurable reasoning effort. Released August 2025. Supports minimal, low, medium, and high reasoning levels. Succeeded by GPT-5.1 and later models.

2025-08-07

Researched 48d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions

$1.25 in / $10.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads. Released August 2025. Replaces o4-mini (shutting down Oct 2026).

2025-08-07

Researched 48d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions

$0.250 in / $2.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Fastest, cheapest GPT-5 variant for summarization and classification tasks. Also available via Realtime API.

2025-08-07

Researched 48d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions

$0.050 in / $0.400 out / 1M tokens

4 routes · 1 batch · 2 cache

Premium extended-reasoning GPT-5.4 variant producing smarter and more precise responses. Replacement for o3-deep-research and o4-mini-deep-research. No prompt caching discount.

2026-03-01

Researched 48d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

OpenAI's GPT-4.1 model released April 2025, excelling at coding tasks, precise instruction following, and web development. Outperforms GPT-4o in these areas with a 1 million token context window. Available via API and in ChatGPT for Plus, Pro, Team, Enterprise, and Edu users.

2025-04-01

Researched 48d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON

$2.00 in / $8.00 out / 1M tokens

4 routes · 1 batch · 2 cache

Fast and efficient small model from OpenAI replacing GPT-4o mini. Released April 2025 alongside GPT-4.1. Shows improvements in instruction-following, coding, and intelligence with a 1 million token context window. Available in ChatGPT for paid users.

2025-04-01

Researched 48d ago

1.05m

1,047,576 tokens

1.05m contextVisionMultimodalTool useFunctionsJSON

$0.400 in / $1.60 out / 1M tokens

4 routes · 2 cache

Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.

2024-10-22

Researched 26d ago

200k

200,000 tokens

200k contextReasoningVisionJSONCode execBatch

$0.800 in / $4.00 out / 1M tokens

6 routes · 1 batch · 2 cache

Llama 4 Maverick 17B Instruct FP8

Meta's Llama 4 Maverick 17B with 128 experts, FP8-optimized for cost-efficient inference. Supports native Model Router integration on Microsoft Foundry.

2025-04-05

Researched 20d ago

1m

1,000,000 tokens

1m contextVisionMultimodalJSONBatch

$0.150 in / $0.600 out / 1M tokens

10 routes · 1 batch

Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 86 on MMLU PRO.

2025-09-29

Researched 39d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

8 routes · 1 batch · 2 cache

Claude Opus 4.5

Claude Opus 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input and an optional reasoning mode. It offers a 200K-token context window and scores 80.7 on MMMU.

2025-11-01

Researched 39d ago

200k

200,000 tokens

200k contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 2 cache

OpenAI GPT-4o: Flagship multimodal model with vision, function calling, and broad capability. $2.50/M input, $10/M output.

2024-05-13

Researched 48d ago

128k

128,000 tokens

128k contextVisionMultimodalTool useFunctionsJSON

$2.50 in / $10.00 out / 1M tokens

5 routes · 1 batch · 2 cache

Mistral Large 2

Advanced reasoning and coding model, multilingual with native function calling and JSON output

2025-11-25

Researched 65d ago

128k

128,000 tokens

128k contextVisionMultimodalTool useFunctionsJSON

$0.480 in / $2.40 out / 1M tokens

3 routes · 1 batch

Claude Sonnet 4.6

Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence. Proprietary decoder-only model with 1M-token context, 64K max output, multimodal vision, extended thinking, and function calling. Available via Anthropic API, AWS Bedrock, GCP Vertex AI, and OpenRouter at $3/1M input and $15/1M output tokens.

2026-02-17

Researched 15d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$3.00 in / $15.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Claude Opus 4.7

Claude Opus 4.7 is Anthropic's generally available flagship model with 1M context, 128K max output, adaptive thinking, and a new tokenizer with roughly 555K words per 1M tokens.

2026-04-16

Researched today

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 3 cache

Claude Opus 4.6

Claude Opus 4.6 is Anthropic's Claude 4.6 model with multimodal text and image input and an optional reasoning mode. It offers a 1M-token context window and scores 80.8 on SWE-bench Verified.

2026-02-05

Researched 39d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 4 cache

Claude Opus 4.8

Claude Opus 4.8 is Anthropic's flagship Claude 4.8 model, released May 28, 2026 for agentic coding, long-horizon reasoning, computer use, and professional knowledge work. It supports text and image inputs, adaptive reasoning, tool use, structured outputs, computer-use tools, prompt caching, Batch API, Dynamic Workflows parallel subagents, a 1M-token context window on Anthropic API/Bedrock/Vertex, and 128K max output. Key datapack rows: SWE-bench Pro 69.2%, SWE-bench Verified 88.6%, Terminal-Bench 2.1 74.6%, HLE with tools 57.9%, OSWorld-Verified 83.4%, GDPval-AA 1890 Elo, and MCP-Atlas 82.2%. Standard Anthropic API pricing is $5/M input and $25/M output.

2026-05-28

Researched 3d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $25.00 out / 1M tokens

6 routes · 1 batch · 1 cache

Mistral Small 3.1 24B Instruct

Mistral's Small 3.1 24B model with multimodal vision understanding capabilities. Optimized for cost-efficient deployment with 128K token context window. Available on Cloudflare Workers AI.

2025-12-15

Researched 65d ago

128k

128,000 tokens

128k contextVisionMultimodalJSONBatch

$0.100 in / $0.300 out / 1M tokens

6 routes · 1 batch

Nano Banana 2 (Gemini 3.1 Flash Image)

Nano Banana 2 is the GA Gemini 3.1 Flash Image model for image generation and editing through the Gemini API. It accepts text, image, PDF, and video inputs, adds video-to-image generation for thumbnails, posters, and infographics, returns text and images, supports search grounding and thinking, and replaces the gemini-3.1-flash-image-preview model retiring on 2026-06-25.

2026-05-28

Researched 22d ago

131k

131,072 tokens

131k contextReasoningVisionMultimodalBatch

Google AI Studio

$0.500 in / $60.00 out / 1M tokens

1 route · 1 batch

Nano Banana Pro (Gemini 3 Pro Image)

Nano Banana Pro is the GA Gemini 3 Pro Image model for high-fidelity image creation through the Gemini API. It is aimed at complex graphic design, product mockups, data visualizations, and accurate text rendering, and replaces gemini-3-pro-image-preview retiring on 2026-06-25.

2026-05-28

Researched 22d ago

66k

65,536 tokens

ReasoningVisionMultimodalTool useJSONBatch

Google AI Studio

$2.00 in / $120.00 out / 1M tokens

1 route · 1 batch

Gemini 3.5 Flash

Gemini 3.5 Flash is Google DeepMind's generally available Flash model for sustained frontier-level performance on agentic and coding tasks. It supports multimodal inputs, native thinking, tool and function calling, structured outputs, code execution, search grounding, batch processing, and long contexts up to 1M tokens.

2026-05-19

Researched 15d ago

1.05m

1,048,576 tokens

1.05m contextReasoningVisionMultimodalAudioTool use

$1.50 in / $9.00 out / 1M tokens

4 routes · 2 batch · 3 cache

Google DeepMind's most capable Gemini 2.5 model with native thinking/reasoning support. Features a 1M-token context window, multimodal inputs (text, image, audio, video), function calling, and strong performance across coding, mathematics, and scientific reasoning tasks.

2025-06-17

Researched 22d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$1.25 in / $10.00 out / 1M tokens

4 routes · 2 batch · 3 cache

OpenAI: GPT-4o-mini available via OpenRouter. Pricing: $0.15/1M input, $0.6/1M output.

2024-07-18

Researched 48d ago

128k

128,000 tokens

128k contextJSONPrompt cacheBatchFine-tune

$0.150 in / $0.600 out / 1M tokens

4 routes · 2 cache

GPT-4o (2024-11-20)

OpenAI: GPT-4o (2024-11-20) available via OpenRouter. Pricing: $2.5/1M input, $10/1M output.

2024-11-20

Researched 48d ago

128k

128,000 tokens

128k contextJSONBatch

$2.50 in / $10.00 out / 1M tokens

2 routes · 1 batch

GPT-5.4 Nano is the smallest and fastest variant in the GPT-5.4 family, optimized for edge deployment and low-latency tasks. Model ID: gpt-5.4-nano.

2026-03-05

Researched 23d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON

$0.200 in / $1.25 out / 1M tokens

3 routes · 1 batch · 3 cache

GPT-5.5 Pro is OpenAI's premium extra-compute deployment of GPT-5.5, released April 23, 2026. It uses the same underlying weights as GPT-5.5 standard with additional parallel test-time compute for harder tasks. Supports text and image inputs, reasoning effort control, tool use, structured outputs, code execution, a 1,050,000-token context window, and 128K max output. Key datapack rows: Terminal-Bench 2.1 78.2%, SWE-bench Pro 58.6%, GPQA Diamond 93.6%, ARC-AGI-2 high effort 83.3%, BrowseComp Pro compute 90.1%, and FrontierMath Tier 4 39.6%. Official pricing is $30/M input, $180/M output, $10/M batch input, and $45/M batch output; native cached input discount is not listed.

2026-04-23

Researched 1d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions

$30.00 in / $180.00 out / 1M tokens

3 routes · 1 batch

GPT-5.4 Mini is a smaller, cost-efficient variant of GPT-5.4 with a 400K token context window. Designed for tasks requiring long-context processing at lower cost. Model ID: gpt-5.4-mini.

2026-03-05

Researched 14d ago

400k

400,000 tokens

400k contextReasoningVisionMultimodalTool useFunctions

$0.750 in / $4.50 out / 1M tokens

3 routes · 1 batch · 3 cache

GPT-4o Mini Transcribe

GPT-4o Mini Transcribe is OpenAI's cost-efficient speech-to-text model based on GPT-4o mini, released March 20, 2025. Offers substantially better accuracy than Whisper at roughly half the price of gpt-4o-transcribe. Supports batch, Realtime transcription, and Assistants endpoints. Input: $1.25/1M audio tokens. Output: $5.00/1M text tokens. Practical: ~$0.003/min. API ID: gpt-4o-mini-transcribe.

2025-03-20

Researched 20d ago

16k

16,000 tokens

MultimodalAudioBatch

- in / $5.00 out / 1M tokens

1 route

GPT-5.5 is OpenAI's fully retrained agentic model, released April 23, 2026. Optimised for agentic coding, computer use, knowledge work, and early scientific research. Achieves 82.7% on Terminal-Bench 2.0 (Codex CLI scaffold), 84.9% on GDPval, 58.6% on SWE-Bench Pro, 93.6% on GPQA Diamond, and 82.6% on SWE-Bench Verified (Vals.ai independent harness). Knowledge cutoff December 2025. Supports reasoning effort levels (none/low/medium/high/xhigh). Context window 1,050,000 tokens with a long-context surcharge above 272K tokens. Model ID: gpt-5.5.

2026-04-23

Researched 14d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions

$5.00 in / $30.00 out / 1M tokens

4 routes · 1 batch · 2 cache

GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

2026-03-05

Researched 14d ago

1.05m

1,050,000 tokens

1.05m contextReasoningVisionMultimodalTool useFunctions

$2.50 in / $15.00 out / 1M tokens

4 routes · 1 batch · 3 cache

GPT-4o Transcribe

GPT-4o Transcribe is OpenAI's flagship speech-to-text model based on GPT-4o, released March 20, 2025. Delivers substantially better word error rates than Whisper — especially for accented speech, background noise, and variable speaking rates. Supports batch, streaming (Realtime API), and Assistants endpoints. Input: $2.50/1M audio tokens. Output: $10.00/1M text tokens. Practical: ~$0.006/min. API ID: gpt-4o-transcribe.

2025-03-20

Researched 20d ago

16k

16,000 tokens

MultimodalAudioBatch

- in / $10.00 out / 1M tokens

1 route

Most capable agentic coding model from OpenAI. Optimized for long-horizon, agentic coding tasks in the Codex CLI and API. Note: GPT-5.3-Codex-Spark is a distinct ChatGPT Pro research preview (not API-accessible).

2026-02-05

Researched 17d ago

400k

400,000 tokens

400k contextReasoningVisionTool useFunctionsJSON

$1.75 in / $14.00 out / 1M tokens

3 routes · 2 cache

Anthropic's most capable widely released model, built for demanding reasoning and long-horizon agentic work. Claude Fable 5 is the generally available Mythos-class Claude model, supports vision, tool use, structured outputs, prompt caching, Batch API processing, adaptive thinking that is always on, a 1M-token context window, and up to 128k output tokens. It launched on the Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry on June 9, 2026 with first-party pricing at $10 per 1M input tokens and $50 per 1M output tokens, but Anthropic disabled Fable 5 access for all customers on June 12, 2026 after a US export control directive and says it is working to restore access.

2026-06-09

Researched 12d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$10.00 in / $50.00 out / 1M tokens

5 routes · 1 batch · 2 cache

Claude Mythos 5

Anthropic's access-gated frontier model for approved Project Glasswing cybersecurity defenders and biomedical research organizations. Shares the same underlying architecture as Claude Fable 5 but operates with safety classifiers lifted in specific domains: cybersecurity safeguards are removed for all Glasswing participants, while biology safeguards are additionally removed for approved biology-track participants. Succeeds Claude Mythos Preview with significantly reduced pricing ($10/$50 per MTok input/output vs. $25/$125 for Mythos Preview), a 1M-token context window, 128k max output tokens, adaptive thinking always on (raw chain of thought never returned), vision, tool use, structured outputs, and the effort parameter for controlling thinking depth. Extended thinking with manual budget_tokens is not supported. Anthropic disabled Mythos 5 access for all customers on June 12, 2026 after the same US export control directive affecting Claude Fable 5 and says it is working to restore access.

2026-06-09

Researched 12d ago

1m

1,000,000 tokens

1m contextReasoningVisionMultimodalTool useFunctions

$10.00 in / $50.00 out / 1M tokens

3 routes · 1 batch · 1 cache

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's Claude 4.5 model with multimodal text and image input. It offers a 200K-token context window and scores 73.3 on SWE-bench Verified.

2025-10-01

Researched 33d ago

200k

200,000 tokens

200k contextVisionMultimodalTool useFunctionsJSON

$0.800 in / $4.00 out / 1M tokens

8 routes · 1 batch · 2 cache

Efficient lightweight model with strong reasoning and instruction-following capabilities

2025-06-01

Researched 69d ago

32k

32,000 tokens

JSONBatch

Mistral AI Studio

$0.200 in / $0.200 out / 1M tokens

3 routes · 1 batch

Ministral 8B is MistralAI's Ministral model. It offers a 32K-token context window.

2025-06-01

Researched 39d ago

32k

32,000 tokens

JSONBatch

$0.100 in / $0.100 out / 1M tokens

3 routes · 1 batch

Claude 3.5 Sonnet v2

Advanced reasoning and coding model with improved performance, supports cached context for cost savings

2025-12-01

Researched 23d ago

200k

200,000 tokens

200k contextVisionMultimodalJSONBatch

$6.00 in / $30.00 out / 1M tokens

2 routes · 1 batch

GPT-5.5 Instant

GPT-5.5 Instant is OpenAI's latest Instant model used in ChatGPT, released May 5, 2026 as the new default ChatGPT model and exposed in the API as chat-latest. OpenAI says the update improves factuality, image analysis, STEM answers, web-search decisions, personalization from past chats/files/connected Gmail, and concise conversational style. OpenAI reports 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts and 37.3% fewer inaccurate claims on difficult conversations flagged for factual errors.

2026-05-05

Researched 8d ago

400k

400,000 tokens

400k contextVisionMultimodalTool useFunctionsJSON

$5.00 in / $30.00 out / 1M tokens

1 route · 1 cache

Gemini Robotics-ER 1.6 Preview

Gemini Robotics-ER 1.6 Preview is Google DeepMind's enhanced embodied reasoning model for physical AI and robotics tasks. Pointing and counting accuracy improved from 61% (ER 1.5) to 80%; multi-view success detection reaches 84% across simultaneous camera feeds. New instrument-reading capability enables robots to interpret complex gauges and sight glasses. Context window is 131,072 input / 65,536 output tokens. Available via Gemini API and Google AI Studio. Replaces gemini-robotics-er-1.5-preview, which was deprecated April 30, 2026.

2026-04-14

Researched 57d ago

128k

128,000 tokens

128k contextReasoningVisionMultimodalTool useFunctions

Google AI Studio

$1.00 in / $5.00 out / 1M tokens

1 route · 1 batch

Gemini 2.5 Flash TTS Preview

Gemini 2.5 Flash TTS Preview is Google DeepMind's Gemini 2.5 model focused on audio understanding and generation. It offers a 128K-token context window.

2025-04-01

Researched 39d ago

128k

128,000 tokens

128k contextAudioTool useFunctionsBatch

Google AI Studio

$0.500 in / - out / 1M tokens

1 route · 1 batch

Gemini 2.5 Pro TTS Preview

Gemini 2.5 Pro TTS Preview is Google DeepMind's Gemini 2.5 model focused on audio understanding and generation. It offers a 128K-token context window.

2025-04-01

Researched 39d ago

128k

128,000 tokens

128k contextAudioTool useFunctionsBatch

Google AI Studio

$1.00 in / - out / 1M tokens

1 route · 1 batch