What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression
- URL: http://arxiv.org/abs/2110.08419v1
- Date: Sat, 16 Oct 2021 00:20:04 GMT
- Title: What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression
- Authors: Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu,
Ahmed Hassan Awadallah
- Abstract summary: We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
- Score: 68.82486784654817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have focused on compressing pre-trained language models (PLMs)
like BERT where the major focus has been to improve the compressed model
performance for downstream tasks. However, there has been no study in analyzing
the impact of compression on the generalizability and robustness of these
models. Towards this end, we study two popular model compression techniques
including knowledge distillation and pruning and show that compressed models
are significantly less robust than their PLM counterparts on adversarial test
sets although they obtain similar performance on in-distribution development
sets for a task. Further analysis indicates that the compressed models overfit
on the easy samples and generalize poorly on the hard ones. We further leverage
this observation to develop a regularization strategy for model compression
based on sample uncertainty. Experimental results on several natural language
understanding tasks demonstrate our mitigation framework to improve both the
adversarial generalization as well as in-distribution task performance of the
compressed models.
Related papers
- Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models [0.0]
Large language models (LLMs) offer powerful capabilities but incur substantial computational costs.
This study evaluates the impact of popular compression methods on the LLaMA-2-7B model.
We show that while SparseGPT and Wanda preserve perplexity even at 50% sparsity, they suffer significant degradation on downstream tasks.
arXiv Detail & Related papers (2024-09-17T14:34:11Z) - Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments [20.360936113552597]
To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output.
Existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools.
To support real-world comparative, we develop an interactive visual system called Compress and Compare.
Within a single interface, Compress and Compare surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations.
arXiv Detail & Related papers (2024-08-06T16:17:51Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
Quantization [75.72231742114951]
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks.
These models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency.
We propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model.
arXiv Detail & Related papers (2022-03-21T18:04:25Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z) - Can Model Compression Improve NLP Fairness [3.172761915061083]
This is the first paper to examine the effect of distillation and pruning on the toxicity and bias of generative language models.
We test Knowledge Distillation and Pruning methods on the GPT2 model and found a consistent pattern of toxicity and bias reduction.
arXiv Detail & Related papers (2022-01-21T05:14:51Z) - Model Compression for Dynamic Forecast Combination [9.281199058905017]
We show that compressing dynamic forecasting ensembles into an individual model leads to a comparable predictive performance.
We also show that the compressed individual model with best average rank is a rule-based regression model.
arXiv Detail & Related papers (2021-04-05T09:55:35Z) - Self-Supervised GAN Compression [32.21713098893454]
We show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods.
We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator.
We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
arXiv Detail & Related papers (2020-07-03T04:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.