Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language
Models -- and Disappeared in GPT-4
- URL: http://arxiv.org/abs/2306.07622v2
- Date: Fri, 18 Aug 2023 17:33:15 GMT
- Title: Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language
Models -- and Disappeared in GPT-4
- Authors: Thilo Hagendorff, Sarah Fabi
- Abstract summary: We show that large language models (LLMs) exhibit behavior that resembles human-like intuition.
We also probe how sturdy the inclination for intuitive-like decision-making is.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are currently at the forefront of intertwining
AI systems with human communication and everyday life. Therefore, it is of
great importance to evaluate their emerging abilities. In this study, we show
that LLMs, most notably GPT-3, exhibit behavior that strikingly resembles
human-like intuition -- and the cognitive errors that come with it. However,
LLMs with higher cognitive capabilities, in particular ChatGPT and GPT-4,
learned to avoid succumbing to these errors and perform in a hyperrational
manner. For our experiments, we probe LLMs with the Cognitive Reflection Test
(CRT) as well as semantic illusions that were originally designed to
investigate intuitive decision-making in humans. Moreover, we probe how sturdy
the inclination for intuitive-like decision-making is. Our study demonstrates
that investigating LLMs with methods from psychology has the potential to
reveal otherwise unknown emergent traits.
Related papers
- Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) have shown impressive alignment with human cognitive processes.
This study investigates whether ChatGPT possess metacognitive monitoring abilities akin to humans.
arXiv Detail & Related papers (2024-10-17T09:42:30Z) - The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games [9.82711167146543]
We introduce a novel methodology to study the decision-making of Large Language Models (LLMs)
We show that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies.
Surprisingly, emotional prompting, particularly with anger' emotion, can disrupt the "superhuman" alignment of GPT-4.
arXiv Detail & Related papers (2024-06-05T14:08:54Z) - Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance [0.0]
This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF exam.
Using a mixed method approach, we assessed the metacognitive performance of human participants and five advanced LLMs.
The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans.
arXiv Detail & Related papers (2024-05-07T22:15:12Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs.
Even GPT-3.5 produces factual outputs less than 25% of the time.
This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z) - Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically.
In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs.
Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z) - LLM Cognitive Judgements Differ From Human [0.03626013617212666]
I examine GPT-3 and ChatGPT capabilities on a limited-data inductive reasoning task from the cognitive science literature.
The results suggest that these models' cognitive judgements are not human-like.
arXiv Detail & Related papers (2023-07-20T16:22:36Z) - Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas.
A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks.
Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Thinking Fast and Slow in Large Language Models [0.08057006406834465]
Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life.
In this study, we show that LLMs like GPT-3 exhibit behavior that resembles human-like intuition - and the cognitive errors that come with it.
arXiv Detail & Related papers (2022-12-10T05:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.