vLLM

Researched 18d agoInference RuntimeTier 3

vllm-project

InferenceOpen SourceSelf-Hosted

vLLM is a self-hosted runtime, not a hosted model catalog. You bring compatible open-weight models, so there is no per-token pricing to track.
Covers 0 workload areas across 0 tracked models; last verified 2026-06-29.

Use it for

Getting oriented before committing to a specific model

Do not use it for

Final benchmark picks without opening the relevant model detail page

Tracked models

Models available through this provider

Priced output routes

Output pricing not yet tracked

Cheapest output

Unknown

Output pricing not yet tracked

Batch-ready models

No batch pricing tracked

Latest model release

Unknown

Release date of the newest tracked model

Freshness

2026-06-29

Researched 18d ago

fresh

Information

TypeInference Runtime

TierTier 3

Models0

Companyvllm-project

vLLM is a high-throughput, memory-efficient open-source inference engine and serving library for large language models. It is self-hosted rather than a managed token-priced API, and it ships an OpenAI-compatible HTTP server for local or private deployments. The vllm-project also maintains vLLM-Omni, a sibling runtime for serving omnimodal workloads.

Links

Website X / Twitter LinkedIn

Catalog freshness

No confirmed release dates yet for the models tracked on this provider.

Where this host wins

Not enough capability or benchmark coverage yet to call strengths for this provider.

Getting started

Official product, docs, and pricing links — confirm quotas and regions in the vendor docs.

Product Docs

SDKs & libraries

Python

Compliance notes

No verified compliance claims (SOC 2, ISO, HIPAA) tracked for this provider yet — check the vendor's trust center for current certifications.

Platform Overview

vLLM does not offer a fixed hosted model catalog. Operators bring their own open-weight Hugging Face model weights, run vLLM on their own compute, and serve compatible architectures through a local or private OpenAI-compatible endpoint. vLLM-Omni (github.com/vllm-project/vllm-omni) follows the same runtime pattern for omnimodal models across text, image, audio, video, and robotics inputs or outputs; the models it serves remain separate model-family and researcher decisions. LLMReference intentionally does not attach token-priced modelProvider rows to vLLM because the runtime has no per-model commercial offer or published token pricing; users pay their own GPU or infrastructure costs.

Compare per-model pricing, input and output token costs, batch availability, and benchmark coverage.

Where else to run this

Liquid AI model catalog

1 tracked model

OpenRouter model catalog

251 tracked models