Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs
- URL: http://arxiv.org/abs/2404.01461v4
- Date: Tue, 23 Jul 2024 02:41:57 GMT
- Title: Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs
- Authors: Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald,
- Abstract summary: Large language models (LLMs) have demonstrated remarkable proficiency in modeling text and generating human-like text.
LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness.
This research investigates the impact of the representativeness on LLM reasoning.
- Score: 7.100094213474042
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although large language models (LLMs) have demonstrated remarkable proficiency in modeling text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example, versus considering broader facts or statistical evidence. This research investigates the impact of the representativeness heuristic on LLM reasoning. We created ReHeAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to ReHeAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than on the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model to use its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while falling into a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.
Related papers
- A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners [58.15511660018742]
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities.
We develop carefully controlled synthetic datasets featuring conjunction fallacy and syllogistic problems.
arXiv Detail & Related papers (2024-06-16T19:22:53Z) - Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance.
Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate.
This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z) - Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall [31.45796499298925]
Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks.
We focus on assessing LLMs' ability to recall factual knowledge learned from pretraining.
We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses.
arXiv Detail & Related papers (2024-04-24T19:40:01Z) - Cognitive Bias in High-Stakes Decision-Making with LLMs [19.87475562475802]
We develop a framework designed to uncover, evaluate, and mitigate cognitive bias in large language models (LLMs)
Inspired by prior research in psychology and cognitive science, we develop a dataset containing 16,800 prompts to evaluate different cognitive biases.
We test various bias mitigation strategies, amidst proposing a novel method utilising LLMs to debias their own prompts.
arXiv Detail & Related papers (2024-02-25T02:35:56Z) - Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Revisiting the Reliability of Psychological Scales on Large Language
Models [66.31055885857062]
This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in
Large Language Models [4.412336603162406]
Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text.
In this work, we investigate how well popular LLMs capture the magnitudes of numbers from a behavioral lens.
arXiv Detail & Related papers (2023-05-18T07:50:44Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.