LLM Reference
DeepInfra

DeepInfra

Researched 3d agoInference PlatformTier 2

DeepInfra

CodingRAGAgentsLong contextVisionClassificationJSON / Tool useAI

DeepInfra exposes 58 tracked models (54 with output token pricing in seed data). Task coverage across this catalog includes coding, rag, and agents; open any model detail page for benchmarks, batch tiers, and migration prompts.

Portfolio context: 7 decision-task tags, 58 catalog rows, latest research stamp 2026-06-01.

Use this portfolio page for

  • Teams comparing token and batch economics on this surface
  • Operators routing coding, rag, and agents workloads through this API

Do not stop here for

  • Final benchmark picks without opening the relevant model detail page

Catalog rows

58

Models linked to this provider in seed data

Priced output routes

54

Rows with token_out in seed data

Cheapest output

$0.030

Qwen2.5-7B-Instruct on this route

Batch-ready SKUs

0

No batch pricing tracked

Latest catalog ship

2026-03-11

85d since dated release field

Freshness

2026-06-01

Researched 3d ago

fresh

Catalog release signal

Latest ISO-dated model.release in this catalog is 2026-03-11 (85d ago).

Where this host wins

  • Coding: 17 tracked models with SWE-bench / HumanEval-style scores.
  • RAG: 17 tracked models with ruler / needle retrieval benchmarks.
  • Agentic: 6 tracked models with BFCL, tau-bench, and SWE-bench tool-use coverage.
  • Long-context: 18 tracked models with context-token or InfiniteBench-class signal.

Getting started

Official entry points from seed metadata — confirm quotas and regions in vendor docs.

Compliance notes (verbatim seed excerpts)

Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.

Platform Overview

DeepInfra offers serverless AI inference with a simple API, supporting hundreds of models across text generation, embeddings, and more. Pay-per-token pricing with no upfront commitments.

Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.

Available Models(58)

View all →

All models available as Serverless

View full catalog →

Platform Details

TypeInference Platform
TierTier 2
Models58

Organization

DeepInfra
Founded2023
San Francisco, California, United States

DeepInfra is a cloud inference platform offering cost-effective access to open-source AI models. It provides serverless inference for leading models from Meta, Mistral, Alibaba, and others with competitive token-based pricing.