LLM Reference

Chutes AI

Researched today

Chutes AI

CodingRAGLong contextVisionClassificationJSON / Tool use

Chutes AI exposes 5 tracked models (5 with output token pricing in seed data). Task coverage across this catalog includes coding, rag, and long context; open any model detail page for benchmarks, batch tiers, and migration prompts.

Portfolio context: 6 decision-task tags, 5 catalog rows, latest research stamp 2026-05-22.

Use this portfolio page for

  • Teams comparing token and batch economics on this surface
  • Operators routing coding, rag, and long context workloads through this API

Do not stop here for

  • Final benchmark picks without opening the relevant model detail page

Catalog rows

5

Models linked to this provider in seed data

Priced output routes

5

Rows with token_out in seed data

Cheapest output

$0.300

Gemma 2 9B Instruct on this route

Batch-ready SKUs

0

No batch pricing tracked

Latest catalog ship

2024-12-06

532d since dated release field

Freshness

2026-05-22

Researched today

fresh

Catalog release signal

Latest ISO-dated model.release in this catalog is 2024-12-06 (532d ago).

Where this host wins

  • Coding: 1 tracked model with SWE-bench / HumanEval-style scores.
  • RAG: 2 tracked models with ruler / needle retrieval benchmarks.
  • Long-context: 2 tracked models with context-token or InfiniteBench-class signal.
  • Vision: 1 tracked model with multimodal benchmark coverage.

Compliance notes (verbatim seed excerpts)

Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.

Platform Overview

Chutes AI is a decentralized GPU compute network providing inference API access to open-source AI models.

Available Models(5)

View all →

All models available as Serverless

ModelInput (per 1M)Output (per 1M)
Grok-3$0.8$2.4
Llama 3.3 70B Instruct (free)$0.22$0.66
Mistral Large 2 (2407)$0.5$1.5
Gemma 2 9B Instruct$0.1$0.3
Qwen2.5-72B-Instruct$0.18$0.54

Platform Details

Models5

Organization

Chutes AI

Chutes AI is a decentralized GPU compute network providing inference API access to open-source AI models.