Llama 3.1 8B Instruct — Available Providers
Last refreshed 2026-05-16. Next refresh: weekly.
Compare pricing and deployment options across 12 providers.
Monthly cost ranking
Ranked for 1M input and 0.2M output tokens per month. Cache and batch discounts are applied only when the provider row has sourced prices.
Traffic profile
US default
Cost bars
Cheapest
OpenRouter
$0.03 at the selected traffic profile.
Recipe-ready
Together AI
Install, auth, and call snippets are available from curated provider snippet data.
Provider matrix
Region, SLA, and compliance rows are hidden until curated source fields exist.| Provider | Profile cost | Input / Output | Deploy | Recipe | Links |
|---|---|---|---|---|---|
| OpenRouter aggregator | $0.03 | $0.020 / $0.050 | Serverless | Docs only | |
| GroqCloud inference | $0.07 | $0.050 / $0.080 | Serverless | Docs only | |
| Hyperbolic AI Inference inference | $0.12 | $0.10 / $0.10 | Serverless | Docs only | |
| OctoAI API (Deprecated) inference | $0.18 | $0.15 / $0.15 | Serverless | Docs only | |
| Together AI inference | $0.22 | $0.18 / $0.18 | Serverless | Snippets | |
| Fireworks AI inference | $0.24 | $0.20 / $0.20 | Serverless | Snippets | |
| IBM watsonx platform | $0.25 | $0.15 / $0.50 | Serverless | Docs only | |
| AWS Bedrock hyperscaler | $0.26 | $0.22 / $0.22 | Serverless | Snippets | |
| Replicate API marketplace | $0.30 | $0.25 / $0.25 | Serverless | Snippets | |
| Microsoft Foundry hyperscaler | $0.42 | $0.30 / $0.61 | Provisioned | Docs only | |
| Databricks Foundation Model Serving platform | Not enough pricing | - / - | Provisioned | Docs only | |
| NVIDIA NIM inference | Not enough pricing | - / - | Provisioned | Docs only |
Llama 3.1 8B Instruct operational data note: this page ranks sourced token, cache, and batch fields only. Region, SLA, compliance, and latency claims are intentionally omitted until a curated matrix is added.