DeepInfra
Researched 3d agoInference PlatformTier 2DeepInfra
DeepInfra exposes 58 tracked models (54 with output token pricing in seed data). Task coverage across this catalog includes coding, rag, and agents; open any model detail page for benchmarks, batch tiers, and migration prompts.
Portfolio context: 7 decision-task tags, 58 catalog rows, latest research stamp 2026-06-01.
Use this portfolio page for
- Teams comparing token and batch economics on this surface
- Operators routing coding, rag, and agents workloads through this API
Do not stop here for
- Final benchmark picks without opening the relevant model detail page
Catalog rows
58
Models linked to this provider in seed data
Priced output routes
54
Rows with token_out in seed data
Cheapest output
$0.030
Qwen2.5-7B-Instruct on this route
Batch-ready SKUs
0
No batch pricing tracked
Latest catalog ship
2026-03-11
85d since dated release field
Freshness
2026-06-01
Researched 3d ago
Catalog release signal
Latest ISO-dated model.release in this catalog is 2026-03-11 (85d ago).
Where this host wins
- Coding: 17 tracked models with SWE-bench / HumanEval-style scores.
- RAG: 17 tracked models with ruler / needle retrieval benchmarks.
- Agentic: 6 tracked models with BFCL, tau-bench, and SWE-bench tool-use coverage.
- Long-context: 18 tracked models with context-token or InfiniteBench-class signal.
Getting started
Official entry points from seed metadata — confirm quotas and regions in vendor docs.
Compliance notes (verbatim seed excerpts)
Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.
Platform Overview
DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.
Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.
Available Models(58)
View all →All models available as Serverless
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Nemotron 3 Super-120B-A12B | $0.1 | $0.5 |
| Qwen3.5-27B | $0.26 | $2.6 |
| Qwen3-9B | $0.04 | $0.2 |
| Llama 4 Maverick 17B Instruct FP8 | $0.15 | $0.60 |
| Llama 4 Scout 17B-16E Instruct | $0.08 | $0.30 |
| Nemotron 4 340B | $4.20 | $4.20 |
| DeepSeek R1 | ||
| DeepSeek R1 Distill Llama 70B | $0.70 | $0.80 |
| DeepSeek V3 | $0.32 | $0.89 |
| Qwen2.5-Coder-32B | $0.20 | $0.20 |
Platform Details
Organization
DeepInfra is a cloud inference platform offering cost-effective access to open-source AI models. It provides serverless inference for leading models from Meta, Mistral, Alibaba, and others with competitive token-based pricing.