Router profile
vLLM Semantic Router
Red Hat / vLLM Project
Open-source Mixture-of-Models router that semantically classifies each request and routes it to the best backend (local, private, or frontier) by cost, latency, privacy, or safety, deployed as an Envoy External Processor.
Type
Router
Lead directory segment
Pricing model
Free OSS
Model count pending
Hosting
Self-hosted
Self-host option available
Data retention
Zero retention
Verify for production policy
At a glance
- Decision mechanism
- ClassifierSemantic k-NN
- Optimizes for
- CostLatencyPrivacy
- Routing scope
- Cross-providerCross-host
- Decision timing
- Pre-generation
- Deployment path
- Proxy in path
- Openness
- Open source
- API compatibility
- OpenAI
Pricing & data handling
Open-source (Apache 2.0). Deployed as an Envoy ext_proc plugin. Native Kubernetes/OpenShift support. Latest release: v0.2 Athena (March 2026).
- Retention
- Zero retention
- Self-host
- Available
- Last checked
- 2026-06-08
Sources & freshness
- homepage, status, type, openness · checked 2026-06-08
- github, license · checked 2026-06-08
- architecture, v0.2_release · checked 2026-06-08
- initial_announcement · checked 2026-06-08
Last reviewed 2026-06-08.
Compare & related routers
Compare vLLM Semantic Router against another router without mixing model rows into the same view.
Compare with AIRouterCommercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.
AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.
Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.
AI-powered LLM router that analyzes each prompt in real-time to select the optimal model, targeting 20–97% cost reduction while maintaining quality; San Francisco startup reportedly nearing $1.3B valuation.