Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in
Large Language Models
- URL: http://arxiv.org/abs/2308.12578v1
- Date: Thu, 24 Aug 2023 05:35:58 GMT
- Title: Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in
Large Language Models
- Authors: Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He,
Yuexian Hou
- Abstract summary: Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans.
This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology.
- Score: 22.87965173260982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent researches indicate that Pre-trained Large Language Models (LLMs)
possess cognitive constructs similar to those observed in humans, prompting
researchers to investigate the cognitive aspects of LLMs. This paper focuses on
explicit and implicit social bias, a distinctive two-level cognitive construct
in psychology. It posits that individuals' explicit social bias, which is their
conscious expression of bias in the statements, may differ from their implicit
social bias, which represents their unconscious bias. We propose a two-stage
approach and discover a parallel phenomenon in LLMs known as "re-judge
inconsistency" in social bias. In the initial stage, the LLM is tasked with
automatically completing statements, potentially incorporating implicit social
bias. However, in the subsequent stage, the same LLM re-judges the biased
statement generated by itself but contradicts it. We propose that this re-judge
inconsistency can be similar to the inconsistency between human's unaware
implicit social bias and their aware explicit social bias. Experimental
investigations on ChatGPT and GPT-4 concerning common gender biases examined in
psychology corroborate the highly stable nature of the re-judge inconsistency.
This finding may suggest that diverse cognitive constructs emerge as LLMs'
capabilities strengthen. Consequently, leveraging psychological theories can
provide enhanced insights into the underlying mechanisms governing the
expressions of explicit and implicit constructs in LLMs.
Related papers
- The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases [0.0]
This study investigates the subtle and often concealed biases present in Large Language Models (LLMs)
The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary.
This study introduces innovative measures of bias inspired by psychological methodologies.
arXiv Detail & Related papers (2024-07-01T13:21:33Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models [11.132360309354782]
Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities.
We propose a novel strategy to intuitively quantify social perceptions and suggest metrics that can evaluate the social biases within large language models.
arXiv Detail & Related papers (2024-06-06T13:32:09Z) - Cognitive Bias in High-Stakes Decision-Making with LLMs [19.87475562475802]
We develop a framework designed to uncover, evaluate, and mitigate cognitive bias in large language models (LLMs)
Inspired by prior research in psychology and cognitive science, we develop a dataset containing 16,800 prompts to evaluate different cognitive biases.
We test various bias mitigation strategies, amidst proposing a novel method utilising LLMs to debias their own prompts.
arXiv Detail & Related papers (2024-02-25T02:35:56Z) - Measuring Implicit Bias in Explicitly Unbiased Large Language Models [14.279977138893846]
Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases.
We introduce two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks.
Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories.
arXiv Detail & Related papers (2024-02-06T15:59:23Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - Influence of External Information on Large Language Models Mirrors
Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others.
Recent years have witnessed the rapid development of large language models (LLMs)
LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z) - The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye.
We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.