How many models does Baseten API offer?

Baseten API currently offers 14 models through its API.

What are Baseten API's most popular models?

Baseten API's top models include Llama 3 70B Instruct, Llama 3 8B Instruct, Llama 2 7B Chat, CodeLlama 7B, Mixtral 8x22B v0.1.

Baseten API

Researched 3d agoInference PlatformTier 2

Baseten

CodingLong contextClassificationJSON / Tool useAI

Baseten API exposes 14 tracked models (0 with output token pricing in seed data). Task coverage across this catalog includes coding, long context, and classification; open any model detail page for benchmarks, batch tiers, and migration prompts.
Portfolio context: 4 decision-task tags, 14 catalog rows, latest research stamp 2026-06-01.

Use this portfolio page for

Operators routing coding, long context, and classification workloads through this API

Do not stop here for

Final benchmark picks without opening the relevant model detail page
Strict price-per-token comparisons until output pricing is sourced

Catalog rows

Models linked to this provider in seed data

Priced output routes

Add output pricing to unlock comparisons

Cheapest output

Unknown

Need positive token_out rows

Batch-ready SKUs

No batch pricing tracked

Latest catalog ship

2024-04-23

772d since dated release field

Freshness

2026-06-01

Researched 3d ago

fresh

Catalog release signal

Latest ISO-dated model.release in this catalog is 2024-04-23 (772d ago).

Where this host wins

Coding: 6 tracked models with SWE-bench / HumanEval-style scores.
Long-context: 1 tracked model with context-token or InfiniteBench-class signal.
Classification: 8 tracked models with MMLU-class moderation/safety coverage.
JSON/tool-use: 4 tracked models with BFCL / Nexus strict-JSON routing coverage.

Getting started

Official entry points from seed metadata — confirm quotas and regions in vendor docs.

Product Docs Portal Pricing

Compliance notes (verbatim seed excerpts)

Not yet verified from seed copy — no SOC/ISO/HIPAA-class sentences detected to quote verbatim.

Platform Overview

The AI platform offers a comprehensive suite of features designed to streamline the development and deployment of machine learning models. At its core, the platform supports open-source models, allowing developers to leverage existing frameworks and tools for their AI applications. This flexibility is coupled with rapid deployment capabilities, enabling organizations to quickly bring their models into production environments. The platform's architecture is built for scalability, accommodating fluctuating workloads and user demands without compromising performance. A standout feature is its high-speed inference capabilities, crucial for applications that require real-time data processing and decision-making. Cost-effectiveness is a key advantage of the platform, implementing a pay-as-you-go model that minimizes initial investments while optimizing resource utilization. The platform also boasts flexible deployment options, allowing users to deploy models across various environments including cloud, on-premises, or edge devices. This versatility empowers organizations to tailor their deployment strategies to specific needs and existing infrastructure. By combining these features, the platform provides a robust solution that enables businesses to fully harness the potential of AI while maintaining control over costs and deployment logistics.

Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.

Available Models(14)

View all →

All models available as Serverless

Contact provider for pricing

Model
Phi-3 Mini 128K
Phi-3 Mini 4k
Llama 3 70B Instruct
Llama 3 8B Instruct
Mixtral 8x22B v0.1
NSQL 350M
Mixtral 8x7B
Zephyr 7B Alpha
Mistral 7B v0.1
CodeLlama 7B

View full catalog →

Platform Details

TypeInference Platform

TierTier 2

Models14

Organization

Baseten

Founded2019

San Francisco, California, United States

Baseten is an AI infrastructure platform that provides comprehensive tools for deploying and serving machine learning models efficiently and cost-effectively. The platform offers: 1. Rapid deployment: Users can deploy models in minutes, avoiding complex processes. 2. Support for open-source models: Baseten allows deployment of best-in-class open-source models. 3. Optimized serving: The platform provides optimized serving for custom models. 4. Scalability: Horizontally scalable services enable smooth transition from prototype to production. 5. High-speed inference: Baseten offers fast inference on infrastructure that automatically scales with traffic. 6. Cost-efficiency: The platform includes a scaled-to-zero feature to optimize costs. 7. Flexible deployment options: Models can be run on Baseten's cloud or the user's infrastructure. Baseten aims to simplify the ML deployment process while ensuring performance, scalability, and cost-efficiency for AI builders and developers.

Links

Website X / Twitter LinkedIn Crunchbase