Related papers: Fairness-Aware Structured Pruning in Transformers

Fairness-Aware Structured Pruning in Transformers

URL: http://arxiv.org/abs/2312.15398v1
Date: Sun, 24 Dec 2023 03:57:52 GMT
Title: Fairness-Aware Structured Pruning in Transformers
Authors: Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Ioana Baldini, Sarath Chandar
Abstract summary: We investigate how attention heads impact fairness and performance in pre-trained language models. We propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo, and Llama 2 models.
Score: 14.439885480035324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based language models. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in terms of time and resources, as it does not require fine-tuning the final pruned, and fairer, model. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo of two different sizes, GPT-J, and Llama 2 models, respectively, in comparison to the biased model, with only a slight decrease in performance.

Related papers

Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models [16.977176752570617]
Large Language Models (LLMs) are increasingly powerful and accessible to human users. Ensuring fairness across diverse demographic groups, i.e., group fairness, is a critical ethical concern. This work benchmarks the group fairness of learned reward models.
arXiv Detail & Related papers (2025-03-10T19:39:39Z)
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models [4.274270062767065]
In this work, we investigate the bias transfer hypothesis (BTH) under prompt adaptations. We find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior. Our findings highlight the importance of ensuring fairness in pre-trained LLMs.
arXiv Detail & Related papers (2024-12-04T18:32:42Z)
FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models [5.748694060126043]
We evaluate four different types of discriminatory practices within visual-language models. We introduce FairPIthera, a method to reduce them by removing the most affected dimensions of feature embeddings. The application of FairPIthera has led to a significant reduction of up to 98% in observed biases.
arXiv Detail & Related papers (2024-09-28T22:49:22Z)
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model [4.6373877301731]
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs) We test the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone. The resulting models, which we call LLaVA-Gemma, exhibit moderate performance on an array of evaluations, but fail to improve past the current comparably sized SOTA models.
arXiv Detail & Related papers (2024-03-29T21:32:50Z)
Teaching Language Models to Self-Improve through Interactive Demonstrations [83.9421355808174]
Self-improving ability of large language models has been shown to be absent and difficult to learn for smaller models. We introduce TriPosT, a training algorithm that endows smaller models with such self-improvement ability. We show that our approach can improve a LLaMA-7b's performance on math and reasoning tasks by up to 7.13%.
arXiv Detail & Related papers (2023-10-20T14:11:04Z)
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z)
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue. We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z)
Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale [5.759319006531332]
We show the benefits of pre-training with masked language modeling (MLM) objective in models as small as 1.25M parameters. We examine downscaling effects, extending scaling laws to models as small as 1M parameters.
arXiv Detail & Related papers (2023-05-26T21:22:10Z)
Should We Attend More or Less? Modulating Attention for Fairness [11.91250446389124]
We study the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. We propose a novel method for modulating attention weights to improve model fairness after training. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks.
arXiv Detail & Related papers (2023-05-22T14:54:21Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z)
Perturbation Augmentation for Fairer NLP [33.442601687940204]
Language models pre-trained on demographically perturbed corpora are more fair, at least, according to our best metrics for measuring model fairness. Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models.
arXiv Detail & Related papers (2022-05-25T09:00:29Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.