Related papers: On the Impact of Calibration Data in Post-training Quantization and Pruning

On the Impact of Calibration Data in Post-training Quantization and Pruning

URL: http://arxiv.org/abs/2311.09755v2
Date: Mon, 12 Aug 2024 17:57:00 GMT
Title: On the Impact of Calibration Data in Post-training Quantization and Pruning
Authors: Miles Williams, Nikolaos Aletras,
Abstract summary: Quantization and pruning form the foundation of compression for neural networks. We present the first empirical study on the effect of calibration data upon model compression methods.
Score: 36.1039389951318
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantization and pruning form the foundation of compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated remarkable performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples that are used to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of quantization and pruning methods, datasets, tasks, and models. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.

Related papers

CalibrateMix: Guided-Mixup Calibration of Image Semi-Supervised Models [49.588973929678765]
CalibrateMix is a mixup-based approach that aims to improve the calibration of SSL models.<n>Our method achieves lower expected calibration error (ECE) and superior accuracy compared to existing SSL approaches.
arXiv Detail & Related papers (2025-11-17T04:43:53Z)
Preserving LLM Capabilities through Calibration Data Curation: From Analysis to Optimization [29.190967827961142]
Post-training compression is used to scale down large language model (LLM)<n> calibration data plays a vital role by informing the weight importance and activation dynamic ranges.<n>We explore the calibration data's impacts on high-level complex reasoning capabilities, like math problem solving and code generation.
arXiv Detail & Related papers (2025-10-12T14:00:23Z)
Efficient Data Selection at Scale via Influence Distillation [53.03573620682107]
This paper introduces Influence Distillation, a mathematicallyjustified framework for data selection.<n>By distilling each sample's influence on a target distribution, our method assigns model-specific weights that are used to select training data.<n>Experiments show that Influence Distillation matches or outperforms state-of-the-art performance while achieving up to $3.5times$ faster selection.
arXiv Detail & Related papers (2025-05-25T09:08:00Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE) RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
Beware of Calibration Data for Pruning Large Language Models [41.1689082093302]
Post-training pruning is a promising method that does not require resource-intensive iterative training. We show that the effects of calibration data even value more than designing advanced pruning strategies. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance.
arXiv Detail & Related papers (2024-10-23T09:36:21Z)
Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models [0.0]
Large language models (LLMs) offer powerful capabilities but incur substantial computational costs. This study evaluates the impact of popular compression methods on the LLaMA-2-7B model. We show that while SparseGPT and Wanda preserve perplexity even at 50% sparsity, they suffer significant degradation on downstream tasks.
arXiv Detail & Related papers (2024-09-17T14:34:11Z)
MetaAug: Meta-Data Augmentation for Post-Training Quantization [32.02377559968568]
Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model. We propose a novel meta-learning based approach to enhance the performance of post-training quantization.
arXiv Detail & Related papers (2024-07-20T02:18:51Z)
On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs) However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z)
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration [59.48235003469116]
We show that data augmentation consistently enhances OOD performance. We also show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance.
arXiv Detail & Related papers (2023-09-14T16:16:40Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Post-training Model Quantization Using GANs for Synthetic Data Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z)
On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data. We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.