Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
- URL: http://arxiv.org/abs/2501.02295v1
- Date: Sat, 04 Jan 2025 14:08:52 GMT
- Title: Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
- Authors: Yachao Zhao, Bo Wang, Yan Wang,
- Abstract summary: Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.
This paper presents a systematic framework grounded in social psychology theories to investigate explicit and implicit biases in LLMs.
- Score: 5.800102484016876
- License:
- Abstract: Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content. While extensive research has investigated bias in LLMs, prior work has predominantly focused on explicit bias, leaving the more nuanced implicit biases largely unexplored. This paper presents a systematic framework grounded in social psychology theories to investigate and compare explicit and implicit biases in LLMs. We propose a novel "self-reflection" based evaluation framework that operates in two phases: first measuring implicit bias through simulated psychological assessment methods, then evaluating explicit bias by prompting LLMs to analyze their own generated content. Through extensive experiments on state-of-the-art LLMs across multiple social dimensions, we demonstrate that LLMs exhibit a substantial inconsistency between explicit and implicit biases, where explicit biases manifest as mild stereotypes while implicit biases show strong stereotypes. Furthermore, we investigate the underlying factors contributing to this explicit-implicit bias inconsistency. Our experiments examine the effects of training data scale, model parameters, and alignment techniques. Results indicate that while explicit bias diminishes with increased training data and model size, implicit bias exhibits a contrasting upward trend. Notably, contemporary alignment methods (e.g., RLHF, DPO) effectively suppress explicit bias but show limited efficacy in mitigating implicit bias. These findings suggest that while scaling up models and alignment training can address explicit bias, the challenge of implicit bias requires novel approaches beyond current methodologies.
Related papers
- Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models [10.565316815513235]
Large language models (LLMs) may still exhibit implicit biases when simulating human behavior.
We show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations.
When comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified.
arXiv Detail & Related papers (2025-01-29T05:21:31Z) - How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.
Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z) - Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion [0.40964539027092917]
We evaluate the severity of bias toward a view by using a biased model in edge cases of excessive bias scenarios.
Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances.
The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness.
arXiv Detail & Related papers (2024-08-15T15:23:00Z) - The African Woman is Rhythmic and Soulful: An Investigation of Implicit Biases in LLM Open-ended Text Generation [3.9945212716333063]
Implicit biases are significant because they influence the decisions made by Large Language Models (LLMs)
Traditionally, explicit bias tests or embedding-based methods are employed to detect bias, but these approaches can overlook more nuanced, implicit forms of bias.
We introduce two novel psychological-inspired methodologies to reveal and measure implicit biases through prompt-based and decision-making tasks.
arXiv Detail & Related papers (2024-07-01T13:21:33Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.
Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.
Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Investigating Bias in LLM-Based Bias Detection: Disparities between LLMs and Human Perception [13.592532358127293]
We investigate the presence and nature of bias within Large Language Models (LLMs)
We probe whether LLMs exhibit biases, particularly in political bias prediction and text continuation tasks.
We propose debiasing strategies, including prompt engineering and model fine-tuning.
arXiv Detail & Related papers (2024-03-22T00:59:48Z) - Measuring Implicit Bias in Explicitly Unbiased Large Language Models [14.279977138893846]
Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases.
We introduce two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks.
Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories.
arXiv Detail & Related papers (2024-02-06T15:59:23Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.