LLM Reference
AI Glossary

quantization

Definition

Quantization reduces LLM precision by mapping high-bit weights and activations (e.g., FP16) to lower-bit representations (e.g., INT8 or INT4), minimizing memory footprint and inference latency. Techniques like post-training quantization preserve accuracy by calibrating rounding errors, enabling deployment on resource-constrained hardware.