Rethinking the Outlier Distribution in Large Language Models: An In-depth Study
- URL: http://arxiv.org/abs/2505.21670v1
- Date: Tue, 27 May 2025 18:48:40 GMT
- Title: Rethinking the Outlier Distribution in Large Language Models: An In-depth Study
- Authors: Rahul Raman, Khushi Sharma, Sai Qian Zhang,
- Abstract summary: Outliers often cause considerable quantization errors, leading to degraded model performance.<n>Recent studies have identified two common types of outliers in large language models: massive activations and channel-wise outliers.
- Score: 4.740962650068888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Investigating outliers in large language models (LLMs) is crucial due to their significant impact on various aspects of LLM performance, including quantization and compression. Outliers often cause considerable quantization errors, leading to degraded model performance. Identifying and addressing these outliers can enhance the accuracy and efficiency of the quantization process, enabling smoother deployment on edge devices or specialized hardware. Recent studies have identified two common types of outliers in LLMs: massive activations and channel-wise outliers. While numerous quantization algorithms have been proposed to mitigate their effects and maintain satisfactory accuracy, few have thoroughly explored the root causes of these outliers in depth. In this paper, we conduct a comprehensive investigation into the formation mechanisms of these outliers and propose potential strategies to mitigate their occurrence. Ultimately, we introduce some efficient approaches to eliminate most massive activations and channel-wise outliers with minimal impact on accuracy.
Related papers
- Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models [1.4999444543328293]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks.<n>This paper investigates the quantization of LLMs, focusing on the LLaMA architecture and its derivatives.<n>We propose a novel mixed-precision quantization approach tailored for LLaMA-like models.
arXiv Detail & Related papers (2025-04-30T11:52:18Z) - R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z) - Systematic Outliers in Large Language Models [41.2150163753952]
Outliers have been widely observed in Large Language Models (LLMs)<n>We provide a detailed analysis of the formation process, underlying causes, and functions of outliers in LLMs.
arXiv Detail & Related papers (2025-02-10T12:54:17Z) - Regularized Multi-LLMs Collaboration for Enhanced Score-based Causal Discovery [13.654021365091305]
We explore the potential of using large language models (LLMs) to enhance causal discovery approaches.<n>We propose a general framework to utilise the capacity of not only one but multiple LLMs to augment the discovery process.
arXiv Detail & Related papers (2024-11-27T01:56:21Z) - DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs [40.48697728884967]
Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations.
Traditional approaches predominantly address Normal Outliers, which are activations across all tokens with relatively large magnitudes.
We introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers.
arXiv Detail & Related papers (2024-06-03T18:27:44Z) - Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs [27.38239289662178]
Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs)
We explore the role of calibration sets in PTQ, specifically their effect on hidden activations.
Our analysis reveals a marked contrast in quantization effectiveness across accessible models.
arXiv Detail & Related papers (2024-05-31T14:24:33Z) - Do Emergent Abilities Exist in Quantized Large Language Models: An
Empirical Study [90.34226812493083]
This work aims to investigate the impact of quantization on emphemergent abilities, which are important characteristics that distinguish LLMs from small language models.
Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation.
To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning.
arXiv Detail & Related papers (2023-07-16T15:11:01Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Outlier Suppression: Pushing the Limit of Low-bit Transformer Language
Models [57.933500846742234]
Recent work recognizes that structured outliers are the critical bottleneck for quantization performance.
We propose an outlier suppression framework including two components: Gamma Migration and Token-Wise Clipping.
This framework effectively suppresses the outliers and can be used in a plug-and-play mode.
arXiv Detail & Related papers (2022-09-27T12:05:59Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.