LLM Reference
Concepts & capability filters
inference_optimization

Speculative decoding

Definition

Speculative decoding accelerates large language model inference by using a small draft model to generate candidate tokens quickly, which a larger verify model checks in parallel, accepting correct ones to reduce latency. It trades minimal accuracy loss for significant speedups in autoregressive generation.

Models Mentioning Speculative decoding(2)