Related papers: Beneath the Surface: How Large Language Models Reflect Hidden Bias

Beneath the Surface: How Large Language Models Reflect Hidden Bias

URL: http://arxiv.org/abs/2502.19749v1
Date: Thu, 27 Feb 2025 04:25:54 GMT
Title: Beneath the Surface: How Large Language Models Reflect Hidden Bias
Authors: Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu,
Abstract summary: We introduce the Hidden Bias Benchmark (HBB), a novel dataset designed to assess hidden bias that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios.<n>We analyze six state-of-the-art Large Language Models, revealing that while models reduce bias in response to overt bias, they continue to reinforce biases in nuanced settings.
Score: 7.026605828163043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The exceptional performance of Large Language Models (LLMs) often comes with the unintended propagation of social biases embedded in their training data. While existing benchmarks evaluate overt bias through direct term associations between bias concept terms and demographic terms, LLMs have become increasingly adept at avoiding biased responses, creating an illusion of neutrality. However, biases persist in subtler, contextually hidden forms that traditional benchmarks fail to capture. We introduce the Hidden Bias Benchmark (HBB), a novel dataset designed to assess hidden bias that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios. We analyze six state-of-the-art LLMs, revealing that while models reduce bias in response to overt bias, they continue to reinforce biases in nuanced settings. Data, code, and results are available at https://github.com/JP-25/Hidden-Bias-Benchmark.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection [5.800102484016876]
Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.<n>This paper presents a systematic framework grounded in social psychology theories to investigate explicit and implicit biases in LLMs.
arXiv Detail & Related papers (2025-01-04T14:08:52Z)
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.<n>In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.<n> Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
arXiv Detail & Related papers (2024-12-02T16:56:08Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory [29.201402717025335]
Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. We have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory.
arXiv Detail & Related papers (2024-08-20T07:40:12Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
We introduce VLBiasBench, a benchmark to evaluate biases in Large Vision-Language Models (LVLMs)<n>VLBiasBench features a dataset that covers nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status, as well as two intersectional bias categories: race x gender and race x social economic status.<n>We conduct extensive evaluations on 15 open-source models as well as two advanced closed-source models, yielding new insights into the biases present in these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Rethinking Debiasing: Real-World Bias Analysis and Mitigation [17.080528126651977]
We revisit biased distributions in existing benchmarks and real-world datasets. We empirically and theoretically identify key characteristics of real-world biases poorly represented by existing benchmarks. We propose a simple yet effective approach that can be easily applied to existing debiasing methods, named Debias in Destruction (DiD)
arXiv Detail & Related papers (2024-05-24T06:06:41Z)
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation [49.3814117521631]
Standard benchmarks of bias and fairness in large language models (LLMs) measure the association between social attributes implied in user prompts and short responses.<n>We develop analogous RUTEd evaluations from three contexts of real-world use.<n>We find that standard bias metrics have no significant correlation with the more realistic bias metrics.
arXiv Detail & Related papers (2024-02-20T01:49:15Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference [20.112129592923246]
We focus on an overlooked aspect of the overlap bias in NLI models: the reverse word-overlap bias. Current NLI models are highly biased towards the non-entailment label on instances with low overlap. We investigate the reasons for the emergence of the overlap bias and the role of minority examples in its mitigation.
arXiv Detail & Related papers (2022-11-07T21:02:23Z)
LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. We propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.