Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
- URL: http://arxiv.org/abs/2311.04892v2
- Date: Sat, 27 Jan 2024 08:49:29 GMT
- Title: Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
- Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan,
Peter Clark, Ashish Sabharwal, Tushar Khot
- Abstract summary: We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks.
Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
- Score: 67.51906565969227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have showcased the ability of LLMs to embody diverse personas in
their responses, exemplified by prompts like 'You are Yoda. Explain the Theory
of Relativity.' While this ability allows personalization of LLMs and enables
human behavior simulation, its effect on LLMs' capabilities remains unclear. To
fill this gap, we present the first extensive study of the unintended
side-effects of persona assignment on the ability of LLMs to perform basic
reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse
personas (e.g. an Asian person) spanning 5 socio-demographic groups. Our
experiments unveil that LLMs harbor deep rooted bias against various
socio-demographics underneath a veneer of fairness. While they overtly reject
stereotypes when explicitly asked ('Are Black people less skilled at
mathematics?'), they manifest stereotypical and erroneous presumptions when
asked to answer questions while adopting a persona. These can be observed as
abstentions in responses, e.g., 'As a Black person, I can't answer this
question as it requires math knowledge', and generally result in a substantial
performance drop. Our experiments with ChatGPT-3.5 show that this bias is
ubiquitous - 80% of our personas demonstrate bias; it is significant - some
datasets show performance drops of 70%+; and can be especially harmful for
certain groups - some personas suffer statistically significant drops on 80%+
of the datasets. Overall, all 4 LLMs exhibit this bias to varying extents, with
GPT-4-Turbo showing the least but still a problematic amount of bias (evident
in 42% of the personas). Further analysis shows that these persona-induced
errors can be hard-to-discern and hard-to-avoid. Our findings serve as a
cautionary tale that the practice of assigning personas to LLMs - a trend on
the rise - can surface their deep-rooted biases and have unforeseeable and
detrimental side-effects.
Related papers
- Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models [0.0]
We test similar biases in Large Language Models (LLMs) as annotators.
Unlike humans, who are only biased when faced with statements from extreme parties, LLMs exhibit significant bias even when prompted with statements from center-left and center-right parties.
arXiv Detail & Related papers (2024-08-28T16:05:20Z) - Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks.
These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences.
We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Large Language Models Show Human-like Social Desirability Biases in Survey Responses [12.767606361552684]
We show that Large Language Models (LLMs) skew their scores towards the desirable ends of trait dimensions when personality evaluation is inferred.
This bias exists in all tested models, including GPT-4/3.5, Claude 3, Llama 3, and PaLM-2.
reverse-coding all the questions decreases bias levels but does not eliminate them, suggesting that this effect cannot be attributed to acquiescence bias.
arXiv Detail & Related papers (2024-05-09T19:02:53Z) - Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others.
We formally define LLM's self-bias - the tendency to favor its own generation.
We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty.
We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z) - Gender bias and stereotypes in Large Language Models [0.6882042556551611]
This paper investigates Large Language Models' behavior with respect to gender stereotypes.
We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias.
Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize
arXiv Detail & Related papers (2023-08-28T22:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.