Which OctoAI API (Deprecated) model is cheapest?

The cheapest OctoAI API (Deprecated) model in this catalog is Hermes 2 Pro Llama 3 8B at $0.15/1M input tokens.

What is the context window for OctoAI API (Deprecated) models?

OctoAI API (Deprecated) models listed here range from 8k to 128k tokens of context.

How does OctoAI API (Deprecated) compare to Fireworks AI?

OctoAI API (Deprecated) lists 13 models here, while Fireworks AI lists 224. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

OctoAI API (Deprecated) Models — Pricing & Benchmarks

13 models available · OctoAI

OctoAI API (Deprecated) hosts 13 AI models in this catalog. The lowest listed input price is Hermes 2 Pro Llama 3 8B at $0.15/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
Hermes 2 Pro Llama 3 8B	$0.15	$0.15	8k
Llama 3 8B Instruct	$0.15	$0.15	8k
Llama 3.1 8B Instruct	$0.15	$0.15	128k
Llama Guard 2 8B	$0.15	$0.15	8k
Mistral 7B v0.1	$0.15	$0.15	8k
Nous Hermes 2 Mixtral 8x7B	$0.15	$0.15	32k
Qwen2-7B	$0.15	$0.15	128k
Mixtral 8x7B	$0.45	$0.45	32k
Llama 3 70B Instruct	$0.9	$0.9	8k
Llama 3.1 70B Instruct	$0.9	$0.9	128k
Mixtral 8x22B v0.1	$1.20	$1.20	64k
WizardLM-2 8x22B	$1.20	$1.20	—
Llama 3.1 405B Instruct	$3.00	$9.00	128k

Where else to run this

Llama Guard 2 8B on OctoAI API (Deprecated)

Provider setup and pricing

Llama 3 8B Instruct on OctoAI API (Deprecated)

Provider setup and pricing

Mixtral 8x7B on OctoAI API (Deprecated)

Provider setup and pricing

Llama Guard 2 8B on Fireworks AI

Alternative host

Llama 3 8B Instruct on AWS Bedrock

Alternative host

Mixtral 8x7B on Databricks Foundation Model Serving

Alternative host

Pricing Overview

Cheapest$0.15/1M

Most expensive$3.00/1M

About OctoAI API (Deprecated)

OctoAI's generative AI platform offers a versatile and scalable solution for running, tuning, and scaling various AI models. The platform's core feature, OctoStack, provides a turnkey production stack that enables model deployment in cloud or on-premises environments, ensuring data control and privacy. Users can access a library of pre-built templates for popular open-source models, facilitating quick development and integration into existing workflows. The platform also incorporates advanced performance optimizations, significantly improving GPU utilization and reducing operational costs, making it suitable for high-demand applications. The platform emphasizes user experience through easy-to-use APIs and customizable features. It employs automated hardware selection to optimize price-performance trade-offs, enabling efficient scaling of applications. With capabilities such as intelligent request routing, efficient auto-scaling, and reduced cold start times, the platform can handle millions of daily image generations seamlessly. Additionally, it offers fine-tuning options and dynamic customizations, allowing users to create unique, high-quality outputs tailored to their specific needs, thereby enhancing overall application performance and user satisfaction.

Full provider profile →

Links

Dashboard Documentation Pricing