inference_optimization

Speculative decoding

See matching models with benchmark scores and pricing.

Definition

Speculative decoding speeds up inference by having a small draft model propose several tokens that the target model verifies in parallel, keeping the ones it would have produced anyway. Whether output is identical or only near-identical to standard decoding depends on the verifier's acceptance scheme; it is a latency optimization, transparent to how you call the API.

Models Mentioning Speculative decoding(2)

Granite 4.0 1B Speech2026-03 Xiaomi MiMo-V2-Flash2025-12