DeepInfra
Inference PlatformTier 2DeepInfra
AI
Platform Overview
DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.
Available Models(58)
View all →All models available as Serverless
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Qwen3 9B | $0.04 | $0.2 |
| NVIDIA Nemotron 3 Super 120B | $0.1 | $0.5 |
| Qwen3 27B | $0.26 | $2.6 |
| Llama 4 Maverick 17B Instruct FP8 | ||
| Llama 4 Scout 17B-16E Instruct | ||
| Nemotron 4 340B | $4.20 | $4.20 |
| DeepSeek R1 | ||
| DeepSeek R1 Distill Llama 70B | $70 | $80 |
| DeepSeek V3 | $32 | $89 |
| Qwen2.5 Coder 32B | $20 | $20 |
Platform Details
TypeInference Platform
TierTier 2
Models58
Organization
DeepInfra
Founded2023
San Francisco, California, United States
DeepInfra is a cloud inference platform offering cost-effective access to open-source AI models. It provides serverless inference for leading models from Meta, Mistral, Alibaba, and others with competitive token-based pricing.