Concepts & capability filters
inference_optimization
Speculative decoding
See matching models with benchmark scores and pricing.
Definition
Speculative decoding speeds up inference by having a small draft model propose several tokens that the target model verifies in parallel, keeping the ones it would have produced anyway. Whether output is identical or only near-identical to standard decoding depends on the verifier's acceptance scheme; it is a latency optimization, transparent to how you call the API.