training_technique

Distillation

See matching models with benchmark scores and pricing.

Definition

Distillation transfers knowledge from a large, complex teacher model to a smaller student model by training the student to mimic the teacher's outputs or intermediate representations, creating efficient deployable versions. It reduces model size and inference cost while retaining much of the performance.

Models Mentioning Distillation(12)

Unisound U22026-06 MAI-Thinking-12026-06 Aion 1.02026-01 ERNIE X1.12025-09 Cogito v2 Preview Llama 70B2025-07 Cogito v2 Preview Llama 109B MoE2025-07 Cogito v2 Preview Llama 405B2025-07 Cogito v2 Preview DeepSeek 671B MoE2025-07 Cogito v1 Preview Llama 3B2025-04 Cogito v1 Preview Llama 70B2025-04 Cogito v1 Preview Llama 8B2025-04 Cogito v1 Preview Qwen-14B2025-04