A Robust Bias Mitigation Procedure Based on the Stereotype Content Model
- URL: http://arxiv.org/abs/2210.14552v1
- Date: Wed, 26 Oct 2022 08:13:58 GMT
- Title: A Robust Bias Mitigation Procedure Based on the Stereotype Content Model
- Authors: Eddie L. Ungless and Amy Rafferty and Hrichika Nag and Bj\"orn Ross
- Abstract summary: We adapt existing work to demonstrate that the Stereotype Content model holds for contextualised word embeddings.
We find the SCM terms are better able to capture bias than demographic terms related to pleasantness.
We present this work as a prototype of a debiasing procedure that aims to remove the need for a priori knowledge of the specifics of bias in the model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Stereotype Content model (SCM) states that we tend to perceive minority
groups as cold, incompetent or both. In this paper we adapt existing work to
demonstrate that the Stereotype Content model holds for contextualised word
embeddings, then use these results to evaluate a fine-tuning process designed
to drive a language model away from stereotyped portrayals of minority groups.
We find the SCM terms are better able to capture bias than demographic agnostic
terms related to pleasantness. Further, we were able to reduce the presence of
stereotypes in the model through a simple fine-tuning procedure that required
minimal human and computer resources, without harming downstream performance.
We present this work as a prototype of a debiasing procedure that aims to
remove the need for a priori knowledge of the specifics of bias in the model.
Related papers
- REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning [18.064064773660174]
We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
arXiv Detail & Related papers (2024-08-18T14:08:31Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - RS-Corrector: Correcting the Racial Stereotypes in Latent Diffusion
Models [20.53932777919384]
We propose a framework called "RS-Corrector" to establish an anti-stereotypical preference in the latent space and update the latent code for refined generated results.
Extensive empirical evaluations demonstrate that the introduced themodel effectively corrects the racial stereotypes of the well-trained Stable Diffusion model.
arXiv Detail & Related papers (2023-12-08T02:59:29Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - MultiModal Bias: Introducing a Framework for Stereotypical Bias
Assessment beyond Gender and Race in Vision Language Models [40.12132844347926]
We provide a visual and textual bias benchmark called MMBias, consisting of around 3,800 images and phrases covering 14 population subgroups.
We utilize this dataset to assess bias in several prominent self supervised multimodal models, including CLIP, ALBEF, and ViLT.
We introduce a debiasing method designed specifically for such large pre-trained models that can be applied as a post-processing step to mitigate bias.
arXiv Detail & Related papers (2023-03-16T17:36:37Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.