Generative Pre-trained Transformer Quantization

GPTQ

Definition

GPTQ (Generative Pre-trained Transformer Quantization) is a post-training quantization method that reduces model precision to 4-8 bits while maintaining performance through careful rounding and calibration. It enables efficient deployment of large models on consumer hardware with minimal accuracy loss.