Concepts & capability filters
training_technique
Distillation
See matching models with benchmark scores and pricing.
Definition
Distillation transfers knowledge from a large, complex teacher model to a smaller student model by training the student to mimic the teacher's outputs or intermediate representations, creating efficient deployable versions. It reduces model size and inference cost while retaining much of the performance.
Models Mentioning Distillation(12)
Unisound U22026-06MAI-Thinking-12026-06Aion 1.02026-01ERNIE X1.12025-09Cogito v2 Preview Llama 70B2025-07Cogito v2 Preview Llama 109B MoE2025-07Cogito v2 Preview Llama 405B2025-07Cogito v2 Preview DeepSeek 671B MoE2025-07Cogito v1 Preview Llama 3B2025-04Cogito v1 Preview Llama 70B2025-04Cogito v1 Preview Llama 8B2025-04Cogito v1 Preview Qwen-14B2025-04