LLM ReferenceLLM Reference
OctoAI API (Deprecated)

OctoAI API (Deprecated) Models — Pricing & Benchmarks

13 models available · OctoAI

OctoAI API (Deprecated) hosts 13 AI models in this catalog. The lowest listed input price is Hermes 2 Pro Llama 3 8B at $0.15/1M input tokens. LLM Reference lets you compare these models across all 63 providers without switching tabs.

ModelInput (per 1M)Output (per 1M)Context
Hermes 2 Pro Llama 3 8B$0.15$0.15
Llama 3 8B Instruct$0.15$0.158K
Llama 3.1 8B Instruct$0.15$0.15128K
Llama Guard 2 8B$0.15$0.158K
Mistral 7B v0.1$0.15$0.158K
Nous Hermes 2 Mixtral 8x7B$0.15$0.15
Qwen2-7B$0.15$0.15128K
Mixtral 8x7B$0.45$0.4532K
Llama 3 70B Instruct$0.9$0.98K
Llama 3.1 70B Instruct$0.9$0.9128K
Mixtral 8x22B v0.1$1.2$1.264K
WizardLM-2 8x22B$1.2$1.2
Llama 3.1 405B Instruct$3$9128K

Pricing Overview

Cheapest$0.15/1M
Most expensive$3.00/1M

About OctoAI API (Deprecated)

OctoAI's generative AI platform offers a versatile and scalable solution for running, tuning, and scaling various AI models. The platform's core feature, OctoStack, provides a turnkey production stack that enables model deployment in cloud or on-premises environments, ensuring data control and privacy. Users can access a library of pre-built templates for popular open-source models, facilitating quick development and integration into existing workflows. The platform also incorporates advanced performance optimizations, significantly improving GPU utilization and reducing operational costs, making it suitable for high-demand applications. The platform emphasizes user experience through easy-to-use APIs and customizable features. It employs automated hardware selection to optimize price-performance trade-offs, enabling efficient scaling of applications. With capabilities such as intelligent request routing, efficient auto-scaling, and reduced cold start times, the platform can handle millions of daily image generations seamlessly. Additionally, it offers fine-tuning options and dynamic customizations, allowing users to create unique, high-quality outputs tailored to their specific needs, thereby enhancing overall application performance and user satisfaction.

Full provider profile →