Concepts & capability filters
training_technique
Distillation
Definition
Distillation transfers knowledge from a large, complex teacher model to a smaller student model by training the student to mimic the teacher's outputs or intermediate representations, creating efficient deployable versions. It reduces model size and inference cost while retaining much of the performance.
Models Mentioning Distillation(12)
Aion 1.02026-01ERNIE X1.12025-09Cogito v2 Preview Llama 70B2025-07Cogito v2 Preview Llama 109B MoE2025-07Cogito v2 Preview Llama 405B2025-07Cogito v2 Preview DeepSeek 671B MoE2025-07Cogito v1 Preview Llama 3B2025-04Cogito v1 Preview Llama 70B2025-04Cogito v1 Preview Llama 8B2025-04Cogito v1 Preview Qwen-14B2025-04Cogito v1 Preview Qwen-32B2025-04Amazon Nova Premier2025-03