From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings
- URL: http://arxiv.org/abs/2402.11512v3
- Date: Tue, 16 Apr 2024 16:40:31 GMT
- Title: From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings
- Authors: Aishik Rakshit, Smriti Singh, Shuvam Keshari, Arijit Ghosh Chowdhury, Vinija Jain, Aman Chadha,
- Abstract summary: DeepSoftDebias is an algorithm that uses a neural network to perform'soft debiasing'
We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks.
We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.
- Score: 2.9324535682810886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform 'soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.
Related papers
- On Debiasing Text Embeddings Through Context Injection [0.0]
We conduct a review of 19 embedding models by quantifying their biases and how well they respond to context injection.
We show that higher performing models are more prone to capturing biases, but are also better at incorporating context.
In a retrieval task, we show that biases in embeddings can lead to non-desirable outcomes.
arXiv Detail & Related papers (2024-10-14T18:11:53Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Towards an Enhanced Understanding of Bias in Pre-trained Neural Language
Models: A Survey with Special Emphasis on Affective Bias [2.6304695993930594]
We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur, and various ways in which these biases could be quantified and mitigated.
Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias.
We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models.
arXiv Detail & Related papers (2022-04-21T18:51:19Z) - A Survey on Bias and Fairness in Natural Language Processing [1.713291434132985]
We analyze the origins of biases, the definitions of fairness, and how different subfields of NLP bias can be mitigated.
We discuss how future studies can work towards eradicating pernicious biases from NLP algorithms.
arXiv Detail & Related papers (2022-03-06T18:12:30Z) - Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
Bias in NLP [10.936043362876651]
We propose a decoding algorithm that reduces the probability of a model producing problematic text.
While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.
arXiv Detail & Related papers (2021-02-28T11:07:37Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Fairness Through Robustness: Investigating Robustness Disparity in Deep
Learning [61.93730166203915]
We argue that traditional notions of fairness are not sufficient when the model is vulnerable to adversarial attacks.
We show that measuring robustness bias is a challenging task for DNNs and propose two methods to measure this form of bias.
arXiv Detail & Related papers (2020-06-17T22:22:24Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.