AI Glossary
inference_optimization
Speculative decoding
Definition
Speculative decoding accelerates large language model inference by using a small draft model to generate candidate tokens quickly, which a larger verify model checks in parallel, accepting correct ones to reduce latency. It trades minimal accuracy loss for significant speedups in autoregressive generation.