GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models
- URL: http://arxiv.org/abs/2312.06315v1
- Date: Mon, 11 Dec 2023 12:02:14 GMT
- Title: GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models
- Authors: Jiaxu Zhao, Meng Fang, Shirui Pan, Wenpeng Yin, Mykola Pechenizkiy
- Abstract summary: Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
- Score: 83.30078426829627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Warning: This paper contains content that may be offensive or upsetting.
There has been a significant increase in the usage of large language models
(LLMs) in various applications, both in their original form and through
fine-tuned adaptations. As a result, LLMs have gained popularity and are being
widely adopted by a large user community. However, one of the concerns with
LLMs is the potential generation of socially biased content. The existing
evaluation methods have many constraints, and their results exhibit a limited
degree of interpretability. In this work, we propose a bias evaluation
framework named GPTBIAS that leverages the high performance of LLMs (e.g.,
GPT-4 \cite{openai2023gpt4}) to assess bias in models. We also introduce
prompts called Bias Attack Instructions, which are specifically designed for
evaluating model bias. To enhance the credibility and interpretability of bias
evaluation, our framework not only provides a bias score but also offers
detailed information, including bias types, affected demographics, keywords,
reasons behind the biases, and suggestions for improvement. We conduct
extensive experiments to demonstrate the effectiveness and usability of our
bias evaluation framework.
Related papers
- Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection [5.800102484016876]
Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.
This paper presents a systematic framework grounded in social psychology theories to investigate explicit and implicit biases in LLMs.
arXiv Detail & Related papers (2025-01-04T14:08:52Z) - Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.
In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.
Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
arXiv Detail & Related papers (2024-12-02T16:56:08Z) - Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks.
Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations.
This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z) - Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility.
We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge.
Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z) - Unveiling and Mitigating Bias in Large Language Model Recommendations: A Path to Fairness [3.5297361401370044]
This study explores the interplay between bias and LLM-based recommendation systems.
It focuses on music, song, and book recommendations across diverse demographic and cultural groups.
Our findings reveal that bias in these systems is deeply ingrained, yet even simple interventions like prompt engineering can significantly reduce it.
arXiv Detail & Related papers (2024-09-17T01:37:57Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models [58.57987316300529]
Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks.
To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets.
We propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks.
arXiv Detail & Related papers (2024-07-02T16:31:37Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.
Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.
Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation [49.3814117521631]
Standard benchmarks of bias and fairness in large language models (LLMs) measure the association between social attributes implied in user prompts and short responses.
We develop analogous RUTEd evaluations from three contexts of real-world use.
We find that standard bias metrics have no significant correlation with the more realistic bias metrics.
arXiv Detail & Related papers (2024-02-20T01:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.