LLM Reference

Router profile

vLLM Semantic Router

Red Hat / vLLM Project

Visit vLLM Semantic Router
RouterFresh · 2026-06-08
Open-source Mixture-of-Models router that semantically classifies each request and routes it to the best backend (local, private, or frontier) by cost, latency, privacy, or safety, deployed as an Envoy External Processor.

Type

Router

Lead directory segment

Pricing model

Free OSS

Model count pending

Hosting

Self-hosted

Self-host option available

Data retention

Zero retention

Verify for production policy

At a glance

Decision mechanism
ClassifierSemantic k-NN
Optimizes for
CostLatencyPrivacy
Routing scope
Cross-providerCross-host
Decision timing
Pre-generation
Deployment path
Proxy in path
Openness
Open source
API compatibility
OpenAI

Pricing & data handling

Open-source (Apache 2.0). Deployed as an Envoy ext_proc plugin. Native Kubernetes/OpenShift support. Latest release: v0.2 Athena (March 2026).

Retention
Zero retention
Self-host
Available
Last checked
2026-06-08

Sources & freshness

Last reviewed 2026-06-08.

Compare & related routers

Compare vLLM Semantic Router against another router without mixing model rows into the same view.

Compare with AIRouter
AIRouter

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Amazon Bedrock Intelligent Prompt Routing

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

Azure AI Foundry Model Router

Microsoft Azure AI Foundry's native model router that uses a trained ML model to route each prompt in real time to the optimal Azure-hosted model, with Balanced/Cost/Quality mode selection and automatic failover.

Martian

AI-powered LLM router that analyzes each prompt in real-time to select the optimal model, targeting 20–97% cost reduction while maintaining quality; San Francisco startup reportedly nearing $1.3B valuation.