Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in
Large Language Models
- URL: http://arxiv.org/abs/2308.12578v1
- Date: Thu, 24 Aug 2023 05:35:58 GMT
- Title: Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in
Large Language Models
- Authors: Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He,
Yuexian Hou
- Abstract summary: Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans.
This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology.
- Score: 22.87965173260982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent researches indicate that Pre-trained Large Language Models (LLMs)
possess cognitive constructs similar to those observed in humans, prompting
researchers to investigate the cognitive aspects of LLMs. This paper focuses on
explicit and implicit social bias, a distinctive two-level cognitive construct
in psychology. It posits that individuals' explicit social bias, which is their
conscious expression of bias in the statements, may differ from their implicit
social bias, which represents their unconscious bias. We propose a two-stage
approach and discover a parallel phenomenon in LLMs known as "re-judge
inconsistency" in social bias. In the initial stage, the LLM is tasked with
automatically completing statements, potentially incorporating implicit social
bias. However, in the subsequent stage, the same LLM re-judges the biased
statement generated by itself but contradicts it. We propose that this re-judge
inconsistency can be similar to the inconsistency between human's unaware
implicit social bias and their aware explicit social bias. Experimental
investigations on ChatGPT and GPT-4 concerning common gender biases examined in
psychology corroborate the highly stable nature of the re-judge inconsistency.
This finding may suggest that diverse cognitive constructs emerge as LLMs'
capabilities strengthen. Consequently, leveraging psychological theories can
provide enhanced insights into the underlying mechanisms governing the
expressions of explicit and implicit constructs in LLMs.
Related papers
- Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion [0.40964539027092917]
We evaluate the severity of bias toward a view by using a biased model in edge cases of excessive bias scenarios.
Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances.
The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness.
arXiv Detail & Related papers (2024-08-15T15:23:00Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models [11.132360309354782]
Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities.
We propose a novel strategy to intuitively quantify social perceptions and suggest metrics that can evaluate the social biases within large language models.
arXiv Detail & Related papers (2024-06-06T13:32:09Z) - Cognitive Bias in Decision-Making with LLMs [19.87475562475802]
Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks.
LLMs have been shown to inherit societal biases against protected groups, as well as be subject to bias functionally resembling cognitive bias.
Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs.
arXiv Detail & Related papers (2024-02-25T02:35:56Z) - Measuring Implicit Bias in Explicitly Unbiased Large Language Models [14.279977138893846]
Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases.
We introduce two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks.
Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories.
arXiv Detail & Related papers (2024-02-06T15:59:23Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - Influence of External Information on Large Language Models Mirrors
Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others.
Recent years have witnessed the rapid development of large language models (LLMs)
LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z) - The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye.
We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.