Related papers: Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism

Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism

URL: http://arxiv.org/abs/2311.10932v1
Date: Sat, 18 Nov 2023 01:58:23 GMT
Title: Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism
Authors: David Thorstad
Abstract summary: Traditional discussions of bias in large language models focus on a conception of bias closely tied to unfairness. Recent work raises the novel possibility of assessing the outputs of large language models for a range of cognitive biases. I draw out philosophical implications of this discussion for the rationality of human cognitive biases as well as the role of unrepresentative data in driving model biases.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Traditional discussions of bias in large language models focus on a conception of bias closely tied to unfairness, especially as affecting marginalized groups. Recent work raises the novel possibility of assessing the outputs of large language models for a range of cognitive biases familiar from research in judgment and decisionmaking. My aim in this paper is to draw two lessons from recent discussions of cognitive bias in large language models: cautious optimism about the prevalence of bias in current models coupled with an anti-Panglossian willingness to concede the existence of some genuine biases and work to reduce them. I draw out philosophical implications of this discussion for the rationality of human cognitive biases as well as the role of unrepresentative data in driving model biases.

Related papers

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection [18.625071242029936]
Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.<n>This paper presents a systematic framework to investigate and compare explicit and implicit biases in LLMs.
arXiv Detail & Related papers (2025-01-04T14:08:52Z)
Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws [12.559028963968247]
generative language models often reflect and amplify societal biases in their outputs. We propose a targeted stereotype mitigation framework that implicitly mitigates observed stereotypes in generative models. We reduce stereotypical outputs by over 60% across multiple dimensions.
arXiv Detail & Related papers (2024-12-16T03:29:08Z)
CBEval: A framework for evaluating and interpreting cognitive biases in LLMs [1.4633779950109127]
Large Language models exhibit notable gaps in their cognitive processes. As reflections of human-generated data, these models have the potential to inherit cognitive biases.
arXiv Detail & Related papers (2024-12-04T05:53:28Z)
Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion [0.40964539027092917]
We evaluate the severity of bias toward a view by using a biased model in edge cases of excessive bias scenarios. Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances. The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness.
arXiv Detail & Related papers (2024-08-15T15:23:00Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases. We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias. As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z)
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models [11.323961700172175]
This article investigates the challenges and risks associated with biases in large-scale language models like ChatGPT. We discuss the origins of biases, stemming from, among others, the nature of training data, model specifications, algorithmic constraints, product design, and policy decisions. We review the current approaches to identify, quantify, and mitigate biases in language models, emphasizing the need for a multi-disciplinary, collaborative effort to develop more equitable, transparent, and responsible AI systems.
arXiv Detail & Related papers (2023-04-07T17:14:00Z)
Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias [2.6304695993930594]
We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur, and various ways in which these biases could be quantified and mitigated. Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias. We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models.
arXiv Detail & Related papers (2022-04-21T18:51:19Z)
The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings. We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models [37.98671828283487]
Text representation models are prone to exhibit a range of societal biases. Recent work has predominantly focused on measuring and mitigating bias in pretrained language models. We present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit.
arXiv Detail & Related papers (2021-06-07T11:22:39Z)
Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.