Which Lepton AI API model is cheapest?

The cheapest Lepton AI API model in this catalog is Gemma 7B Instruct at $0.07/1M input tokens.

What is the context window for Lepton AI API models?

Lepton AI API models listed here range from 4k to 32k tokens of context.

How does Lepton AI API compare to Fireworks AI?

Lepton AI API lists 14 models here, while Fireworks AI lists 224. Compare pricing availability, context windows, and benchmark coverage before choosing a host.

Lepton AI API Models — Pricing & Benchmarks

14 models available · Lepton AI

Lepton AI API hosts 14 AI models in this catalog. The lowest listed input price is Gemma 7B Instruct at $0.07/1M input tokens. LLM Reference lets you compare these models across all 80 providers without switching tabs.

Model	Input (per 1M)	Output (per 1M)	Context
Gemma 7B Instruct	$0.07	$0.07	8k
Llama 2 7B Chat	$0.07	$0.07	4k
Llama 3 8B Instruct	$0.07	$0.07	8k
Mistral 7B v0.1	$0.07	$0.07	8k
OpenChat 3.5 (0106)	$0.07	$0.07	8k
WizardLM-2 7B	$0.07	$0.07	—
Llama 2 13B Chat	$0.13	$0.13	4k
MythoMax L2 13B	$0.13	$0.13	4k
Nous Hermes 13B	$0.13	$0.13	—
Dolphin 2.6 Mixtral 8x7B	$0.3	$0.3	32k
Mixtral 8x7B	$0.3	$0.3	32k
Llama 2 70B Chat	$0.5	$0.5	4k
WizardLM-2 8x22B	$0.5	$0.5	—
Llama 3 70B Instruct	$0.8	$0.8	8k

Where else to run this

Llama 2 7B Chat on Lepton AI API

Provider setup and pricing

Llama 2 13B Chat on Lepton AI API

Provider setup and pricing

Llama 2 70B Chat on Lepton AI API

Provider setup and pricing

Llama 2 7B Chat on Alibaba Cloud PAI-EAS

Alternative host

Llama 2 13B Chat on Alibaba Cloud PAI-EAS

Alternative host

Llama 2 70B Chat on Databricks Foundation Model Serving

Alternative host

Pricing Overview

Cheapest$0.07/1M

Most expensive$0.80/1M

About Lepton AI API

Lepton AI is a comprehensive cloud-native platform designed to simplify the development and deployment of AI applications. It offers a user-friendly interface that allows developers to build models natively in Python, eliminating the need for complex containerization or Kubernetes expertise. The platform supports local debugging, enabling users to test their models before deployment with a simple command. With a flexible API for easy integration into various applications and support for heterogeneous hardware, Lepton AI optimizes performance based on specific application needs. This flexibility allows for efficient scaling, accommodating workloads that can expand up to 1TB of memory. The platform provides a robust set of tools and infrastructure to enhance AI workflows. Its cloud-native architecture supports high-performance computing, featuring smart scheduling and dynamic batching to minimize downtime. Lepton AI enables continuous deployment through GitHub integration, facilitating rapid iteration and scaling of AI applications. The platform also includes built-in monitoring, logging, and autoscaling capabilities, ensuring that applications remain responsive and efficient in production environments. With these features, Lepton AI streamlines the entire AI development process, from model creation to deployment and maintenance, making it accessible for organizations of various sizes looking to innovate with AI technologies.

Full provider profile →

Links

Dashboard Documentation Pricing