How many models does DeepInfra offer?

DeepInfra currently offers 58 models through its API.

What are DeepInfra's most popular models?

DeepInfra's top models include Llama 3 70B Instruct, Llama 3 8B Instruct, Mixtral 8x22B v0.1, Mistral 7B v0.1, WizardLM-2 8x22B.

What is DeepInfra's pricing?

DeepInfra pricing ranges from $0.02/1M to $4.20/1M input tokens depending on the model.

DeepInfra

Researched 3d agoInference PlatformTier 2

DeepInfra

CodingRAGAgentsLong contextVisionClassificationJSON / Tool useAI

DeepInfra exposes 58 tracked models (54 with output token pricing in seed data). Task coverage across this catalog includes coding, rag, and agents; open any model detail page for benchmarks, batch tiers, and migration prompts.
Portfolio context: 7 decision-task tags, 58 catalog rows, latest research stamp 2026-06-01.

Use this portfolio page for

Teams comparing token and batch economics on this surface
Operators routing coding, rag, and agents workloads through this API

Do not stop here for

Final benchmark picks without opening the relevant model detail page

Catalog rows

Models linked to this provider in seed data

Priced output routes

Rows with token_out in seed data

Cheapest output

$0.030

Qwen2.5-7B-Instruct on this route

Batch-ready SKUs

No batch pricing tracked

Latest catalog ship

2026-03-11

85d since dated release field

Freshness

2026-06-01

Researched 3d ago

fresh

Catalog release signal

Latest ISO-dated model.release in this catalog is 2026-03-11 (85d ago).

Where this host wins

Coding: 17 tracked models with SWE-bench / HumanEval-style scores.
RAG: 17 tracked models with ruler / needle retrieval benchmarks.
Agentic: 6 tracked models with BFCL, tau-bench, and SWE-bench tool-use coverage.
Long-context: 18 tracked models with context-token or InfiniteBench-class signal.

Getting started

Official entry points from seed metadata — confirm quotas and regions in vendor docs.

Product Docs Portal Pricing

Compliance notes (verbatim seed excerpts)

Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.

Platform Overview

DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.

Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.

Available Models(58)

View all →

All models available as Serverless

Model	Input (per 1M)	Output (per 1M)
Nemotron 3 Super-120B-A12B	$0.1	$0.5
Qwen3.5-27B	$0.26	$2.6
Qwen3-9B	$0.04	$0.2
Llama 4 Maverick 17B Instruct FP8	$0.15	$0.60
Llama 4 Scout 17B-16E Instruct	$0.08	$0.30
Nemotron 4 340B	$4.20	$4.20
DeepSeek R1
DeepSeek R1 Distill Llama 70B	$0.70	$0.80
DeepSeek V3	$0.32	$0.89
Qwen2.5-Coder-32B	$0.20	$0.20

View full catalog →

Platform Details

TypeInference Platform

TierTier 2

Models58

Organization

DeepInfra

Founded2023

San Francisco, California, United States

DeepInfra is a cloud inference platform offering cost-effective access to open-source AI models. It provides serverless inference for leading models from Meta, Mistral, Alibaba, and others with competitive token-based pricing.

Links

Website X / Twitter LinkedIn Crunchbase