OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
- URL: http://arxiv.org/abs/2512.24124v1
- Date: Tue, 30 Dec 2025 10:13:50 GMT
- Title: OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
- Authors: Advait Gadhikar, Riccardo Grazzi, James Hensman,
- Abstract summary: We propose methods that learn rotations by minimizing principled and cheap proxy objectives to weight quantization error.<n>Our main method is OptRot, which reduces weight outliers simply by minimizing the element-wise fourth power of the rotated weights.
- Score: 9.958479437486288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The presence of outliers in Large Language Models (LLMs) weights and activations makes them difficult to quantize. Recent work has leveraged rotations to mitigate these outliers. In this work, we propose methods that learn fusible rotations by minimizing principled and cheap proxy objectives to the weight quantization error. We primarily focus on GPTQ as the quantization method. Our main method is OptRot, which reduces weight outliers simply by minimizing the element-wise fourth power of the rotated weights. We show that OptRot outperforms both Hadamard rotations and more expensive, data-dependent methods like SpinQuant and OSTQuant for weight quantization. It also improves activation quantization in the W4A8 setting. We also propose a data-dependent method, OptRot$^{+}$, that further improves performance by incorporating information on the activation covariance. In the W4A4 setting, we see that both OptRot and OptRot$^{+}$ perform worse, highlighting a trade-off between weight and activation quantization.
Related papers
- SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs [0.0]
We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs)<n>Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy.
arXiv Detail & Related papers (2025-06-04T19:07:45Z) - KurTail : Kurtosis-based LLM Quantization [51.24081396305435]
KurTail is a new post-training quantization scheme that mitigates outliers in the activations of large language models.<n>It offers a 13.3% boost in MMLU accuracy and a 15.5% drop in Wiki perplexity compared to QuaRot.<n>It also outperforms SpinQuant with a 2.6% MMLU gain and reduces perplexity by 2.9%, all while reducing the training cost.
arXiv Detail & Related papers (2025-03-03T12:43:06Z) - DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation [5.174900115018253]
We find substantial improvement in eliminating outliers for common tokens and achieve similar quantization error.<n>Due to the extreme rarity of these tokens and their critical impact on model accuracy, we construct a simple yet effective method: a weighted loss function.<n>Our method enhances the Rotated LLMs by achieving dual free, Outlier-Free and Massive Activation-Free, dubbed as DFRot.
arXiv Detail & Related papers (2024-12-01T02:55:08Z) - SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [61.474101404805545]
Diffusion models can generate high-quality images, but as they scale, rising memory demands and higher latency pose deployment challenges.<n>We propose SVDQuant, a new 4-bit quantization paradigm to overcome this limitation.<n>We reduce the memory usage for the 12B FLUX.1 models by 3.5$times$, achieving 3.0$times$ speedup over the 4-bit weight-only quantization (W4A16) baseline.
arXiv Detail & Related papers (2024-11-07T18:59:58Z) - FlatQuant: Flatness Matters for LLM Quantization [58.28221892035609]
We propose FlatQuant, a new post-training quantization approach that enhances the flatness of weights and activations.<n>Our approach identifies optimal affine transformations for each linear layer, calibrated in hours via a lightweight objective.<n>It achieves less than 1% accuracy drop for W4A4 quantization on the LLaMA-3-70B model, surpassing SpinQuant by 7.5%.
arXiv Detail & Related papers (2024-10-12T08:10:28Z) - Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference [54.2589824716527]
Large language models incur substantial computation and memory movement costs due to their large scale.
Existing approaches separate outliers and normal values into two matrices or migrate outliers from activations to weights, suffering from high latency or accuracy degradation.
We propose Rotated Smooth (RRS), a plug-and-play activation smoother for quantization, consisting of Smooth and Rotation operation.
The proposed method outperforms the state-of-the-art method in the LLaMA and Qwen families and improves WikiText-2 perplexity from 57.33 to 6.66 for INT4 inference.
arXiv Detail & Related papers (2024-09-30T14:59:22Z) - DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs [40.48697728884967]
Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations.
Traditional approaches predominantly address Normal Outliers, which are activations across all tokens with relatively large magnitudes.
We introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers.
arXiv Detail & Related papers (2024-06-03T18:27:44Z) - SpinQuant: LLM quantization with learned rotations [49.07335692298487]
Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs)<n>We identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures while enhancing quantization accuracy.<n>We propose SpinQuant, a novel approach that incorporates learned rotation matrices for optimal quantized network accuracy.
arXiv Detail & Related papers (2024-05-26T02:15:49Z) - QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs [72.26197676852958]
We introduce QuaRot, a new Quantization scheme based on Rotations.
QuaRot quantizes end-to-end, including all weights, activations, and KV cache in 4 bits.
Our 4-bit quantized LLaMa2-70B model has losses of at most 0.47 WikiText-2 perplexity and retains 99% of the zero-shot performance.
arXiv Detail & Related papers (2024-03-30T19:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.