Can Model Compression Improve NLP Fairness
- URL: http://arxiv.org/abs/2201.08542v1
- Date: Fri, 21 Jan 2022 05:14:51 GMT
- Title: Can Model Compression Improve NLP Fairness
- Authors: Guangxuan Xu, Qingyuan Hu
- Abstract summary: This is the first paper to examine the effect of distillation and pruning on the toxicity and bias of generative language models.
We test Knowledge Distillation and Pruning methods on the GPT2 model and found a consistent pattern of toxicity and bias reduction.
- Score: 3.172761915061083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model compression techniques are receiving increasing attention; however, the
effect of compression on model fairness is still under explored. This is the
first paper to examine the effect of distillation and pruning on the toxicity
and bias of generative language models. We test Knowledge Distillation and
Pruning methods on the GPT2 model and found a consistent pattern of toxicity
and bias reduction after model distillation; this result can be potentially
interpreted by existing line of research which describes model compression as a
regularization technique; our work not only serves as a reference for safe
deployment of compressed models, but also extends the discussion of
"compression as regularization" into the setting of neural LMs, and hints at
the possibility of using compression to develop fairer models.
Related papers
- Accuracy is Not All You Need [9.371810162601623]
We conduct a detailed study of metrics across multiple compression techniques, models and datasets.
We show that the behavior of compressed models as visible to end-users is significantly different from the baseline model, even when accuracy is similar.
We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
arXiv Detail & Related papers (2024-07-12T10:19:02Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models [9.91972450276408]
This paper introduces an innovative approach for the parametric and practical compression of Large Language Models (LLMs) based on reduced order modelling.
Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
arXiv Detail & Related papers (2023-12-12T07:56:57Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z) - A Short Study on Compressing Decoder-Based Language Models [9.090064110056224]
Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.
The state-of-the-art of PLMs are extremely large to be used on edge devices.
The topic of model compression has attracted increasing attention in the NLP community.
arXiv Detail & Related papers (2021-10-16T03:37:08Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language
Models via Knowledge Distillation [5.8287955127529365]
We push the limits of state-of-the-art Transformer-based pre-trained language model compression using Kronecker decomposition.
We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework.
Our experiments indicate that the proposed model has promising out-of-distribution robustness and is superior to the state-of-the-art compression methods on SQuAD.
arXiv Detail & Related papers (2021-09-13T18:19:30Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.