Soft-prompt Tuning for Large Language Models to Evaluate Bias
- URL: http://arxiv.org/abs/2306.04735v2
- Date: Tue, 5 Mar 2024 17:29:06 GMT
- Title: Soft-prompt Tuning for Large Language Models to Evaluate Bias
- Authors: Jacob-Junqi Tian, David Emerson, Sevil Zanjani Miyandoab, Deval
Pandya, Laleh Seyyed-Kalantari, Faiza Khan Khattak
- Abstract summary: Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
- Score: 0.03141085922386211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompting large language models has gained immense popularity in recent years
due to the advantage of producing good results even without the need for
labelled data. However, this requires prompt tuning to get optimal prompts that
lead to better model performances. In this paper, we explore the use of
soft-prompt tuning on sentiment classification task to quantify the biases of
large language models (LLMs) such as Open Pre-trained Transformers (OPT) and
Galactica language model. Since these models are trained on real-world data
that could be prone to bias toward certain groups of populations, it is
important to identify these underlying issues. Using soft-prompts to evaluate
bias gives us the extra advantage of avoiding the human-bias injection that can
be caused by manually designed prompts. We check the model biases on different
sensitive attributes using the group fairness (bias) and find interesting bias
patterns. Since LLMs have been used in the industry in various applications, it
is crucial to identify the biases before deploying these models in practice. We
open-source our pipeline and encourage industry researchers to adapt our work
to their use cases.
Related papers
- GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes [2.2162879952427343]
This paper introduces GUS-Net, an innovative approach to bias detection.
GUS-Net focuses on three key types of biases: (G)eneralizations, (U)nfairness, and (S)tereotypes.
Our methodology enhances traditional bias detection methods by incorporating the contextual encodings of pre-trained models.
arXiv Detail & Related papers (2024-10-10T21:51:22Z) - From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback.
Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns.
We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z) - BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Current Topological and Machine Learning Applications for Bias Detection
in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases.
Four transformer models, including BERT and RoBERTa variants, were explored.
Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.