Router profile

vLLM Semantic Router

Red Hat / vLLM Project

RouterAging · 2026-06-08

Open-source Mixture-of-Models router that semantically classifies each request and routes it to the best backend (local, private, or frontier) by cost, latency, privacy, or safety, deployed as an Envoy External Processor.

Type

Router

Lead directory segment

Pricing model

Free OSS

Model count pending

Hosting

Self-hosted

Self-host option available

Data retention

Zero retention

Verify for production policy

At a glance

Decision mechanism: ClassifierSemantic k-NN
Optimizes for: CostLatencyPrivacy
Routing scope: Cross-providerCross-host
Decision timing: Pre-generation
Deployment path: Proxy in path
Openness: Open source
API compatibility: OpenAI

Pricing & data handling

Open-source (Apache 2.0). Deployed as an Envoy ext_proc plugin. Native Kubernetes/OpenShift support. Latest release: v0.2 Athena (March 2026).

Retention: Zero retention
Self-host: Available
Last checked: 2026-06-08

Sources & freshness

homepage, status, type, openness · checked 2026-06-08
github, license · checked 2026-06-08
architecture, v0.2_release · checked 2026-06-08
initial_announcement · checked 2026-06-08

Last reviewed 2026-06-08.

Compare & related routers

Compare vLLM Semantic Router against another router without mixing model rows into the same view.

Compare with LiteLLM

AIRouter

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Amazon Bedrock Intelligent Prompt Routing

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

Azure AI Foundry Model Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

Martian

AI-powered LLM router that analyzes each prompt in real-time to select the optimal model, targeting 20–97% cost reduction while maintaining quality; San Francisco startup reportedly nearing $1.3B valuation.