Can Model Compression Improve NLP Fairness
- URL: http://arxiv.org/abs/2201.08542v1
- Date: Fri, 21 Jan 2022 05:14:51 GMT
- Title: Can Model Compression Improve NLP Fairness
- Authors: Guangxuan Xu, Qingyuan Hu
- Abstract summary: This is the first paper to examine the effect of distillation and pruning on the toxicity and bias of generative language models.
We test Knowledge Distillation and Pruning methods on the GPT2 model and found a consistent pattern of toxicity and bias reduction.
- Score: 3.172761915061083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model compression techniques are receiving increasing attention; however, the
effect of compression on model fairness is still under explored. This is the
first paper to examine the effect of distillation and pruning on the toxicity
and bias of generative language models. We test Knowledge Distillation and
Pruning methods on the GPT2 model and found a consistent pattern of toxicity
and bias reduction after model distillation; this result can be potentially
interpreted by existing line of research which describes model compression as a
regularization technique; our work not only serves as a reference for safe
deployment of compressed models, but also extends the discussion of
"compression as regularization" into the setting of neural LMs, and hints at
the possibility of using compression to develop fairer models.
Related papers
- Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP)
ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.
We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z) - Progressive Compression with Universally Quantized Diffusion Models [35.199627388957566]
We explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded.
Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process.
We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model.
arXiv Detail & Related papers (2024-12-14T19:06:01Z) - Accuracy is Not All You Need [9.371810162601623]
We conduct a detailed study of metrics across multiple compression techniques, models and datasets.
We show that the behavior of compressed models as visible to end-users is significantly different from the baseline model, even when accuracy is similar.
We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
arXiv Detail & Related papers (2024-07-12T10:19:02Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models [9.91972450276408]
This paper introduces an innovative approach for the parametric and practical compression of Large Language Models (LLMs) based on reduced order modelling.
Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
arXiv Detail & Related papers (2023-12-12T07:56:57Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.