Do Not Harm Protected Groups in Debiasing Language Representation Models
- URL: http://arxiv.org/abs/2310.18458v2
- Date: Sat, 11 Nov 2023 22:29:58 GMT
- Title: Do Not Harm Protected Groups in Debiasing Language Representation Models
- Authors: Chloe Qinyu Zhu, Rickard Stureborg, Brandon Fain
- Abstract summary: Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias.
We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups.
- Score: 2.9057513016551244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Representation Models (LRMs) trained with real-world data may
capture and exacerbate undesired bias and cause unfair treatment of people in
various demographic groups. Several techniques have been investigated for
applying interventions to LRMs to remove bias in benchmark evaluations on, for
example, word embeddings. However, the negative side effects of debiasing
interventions are usually not revealed in the downstream tasks. We propose
xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In
this work, We examine four debiasing techniques on a real-world text
classification task and show that reducing biasing is at the cost of degrading
performance for all demographic groups, including those the debiasing
techniques aim to protect. We advocate that a debiasing technique should have
good downstream performance with the constraint of ensuring no harm to the
protected group.
Related papers
- With a Grain of SALT: Are LLMs Fair Across Social Dimensions? [3.5001789247699535]
This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs) across gender, religion, and race.
We use the SALT dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation.
Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment.
arXiv Detail & Related papers (2024-10-16T12:22:47Z) - Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization [13.773597081543185]
We introduce a novel debiasing regularization technique based on the class-wise variance of embeddings.
Our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods.
arXiv Detail & Related papers (2024-09-29T03:56:50Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text
Classification [12.165921897192902]
We investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks.
We show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not all.
We recommend that intrinsic bias mitigation techniques should be combined with other fairness interventions for downstream tasks.
arXiv Detail & Related papers (2023-01-30T13:05:48Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - An Empirical Survey of the Effectiveness of Debiasing Techniques for
Pre-Trained Language Models [4.937002982255573]
Recent work has shown that pre-trained language models capture social biases from the text corpora they are trained on.
Five recently proposed debiasing techniques: Counterfactual Data Augmentation, Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias.
We quantify the effectiveness of each technique using three different bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability.
arXiv Detail & Related papers (2021-10-16T09:40:30Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.