LLM Reference
AI Glossary
inference_optimization

Speculative decoding

Definition

Speculative decoding accelerates large language model inference by using a small draft model to generate candidate tokens quickly, which a larger verify model checks in parallel, accepting correct ones to reduce latency. It trades minimal accuracy loss for significant speedups in autoregressive generation.

Models Using Speculative decoding(2)