Conceptor-Aided Debiasing of Large Language Models
- URL: http://arxiv.org/abs/2211.11087v3
- Date: Mon, 30 Oct 2023 22:00:23 GMT
- Title: Conceptor-Aided Debiasing of Large Language Models
- Authors: Li S. Yifei, Lyle Ungar, Jo\~ao Sedoc
- Abstract summary: Pre-trained large language models (LLMs) reflect the inherent social biases of their training corpus.
We use conceptors--a soft projection method--to identify and remove the bias subspace in LLMs such as BERT and GPT.
We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT)
- Score: 1.0435741631709405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained large language models (LLMs) reflect the inherent social biases
of their training corpus. Many methods have been proposed to mitigate this
issue, but they often fail to debias or they sacrifice model accuracy. We use
conceptors--a soft projection method--to identify and remove the bias subspace
in LLMs such as BERT and GPT. We propose two methods of applying conceptors (1)
bias subspace projection by post-processing by the conceptor NOT operation; and
(2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly
incorporates the conceptor projection into all layers during training. We find
that conceptor post-processing achieves state-of-the-art (SoTA) debiasing
results while maintaining LLMs' performance on the GLUE benchmark. Further, it
is robust in various scenarios and can mitigate intersectional bias efficiently
by its AND operation on the existing bias subspaces. Although CI-BERT's
training takes all layers' bias into account and can beat its post-processing
counterpart in bias mitigation, CI-BERT reduces the language model accuracy. We
also show the importance of carefully constructing the bias subspace. The best
results are obtained by removing outliers from the list of biased words,
combining them (via the OR operation), and computing their embeddings using the
sentences from a cleaner corpus.
Related papers
- CosFairNet:A Parameter-Space based Approach for Bias Free Learning [1.9116784879310025]
Deep neural networks trained on biased data often inadvertently learn unintended inference rules.
We introduce a novel approach to address bias directly in the model's parameter space, preventing its propagation across layers.
We show enhanced classification accuracy and debiasing effectiveness across various synthetic and real-world datasets.
arXiv Detail & Related papers (2024-10-19T13:06:40Z) - Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory [29.201402717025335]
Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information.
We have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory.
arXiv Detail & Related papers (2024-08-20T07:40:12Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - Self-Supervised Position Debiasing for Large Language Models [39.261233221850155]
We propose a self-supervised position debiasing (SOD) framework to mitigate position bias for large language models (LLMs)
Experiments on eight datasets and five tasks show that SOD consistently outperforms existing methods in mitigating three types of position biases.
arXiv Detail & Related papers (2024-01-02T14:12:41Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - A Simple yet Effective Self-Debiasing Framework for Transformer Models [49.09053367249642]
Current Transformer-based natural language understanding (NLU) models heavily rely on dataset biases.
We propose a simple yet effective self-debiasing framework for Transformer-based NLU models.
arXiv Detail & Related papers (2023-06-02T20:31:58Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.