Cognitive Effects in Large Language Models
- URL: http://arxiv.org/abs/2308.14337v1
- Date: Mon, 28 Aug 2023 06:30:33 GMT
- Title: Cognitive Effects in Large Language Models
- Authors: Jonathan Shaki, Sarit Kraus, Michael Wooldridge
- Abstract summary: Large Language Models (LLMs) have received enormous attention over the past year and are now used by hundreds of millions of people every day.
We tested one of these models (GPT-3) on a range of cognitive effects, which are systematic patterns that are usually found in human cognitive tasks.
Specifically, we show that the priming, distance, SNARC, and size congruity effects were presented with GPT-3, while the anchoring effect is absent.
- Score: 14.808777775761753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) such as ChatGPT have received enormous attention
over the past year and are now used by hundreds of millions of people every
day. The rapid adoption of this technology naturally raises questions about the
possible biases such models might exhibit. In this work, we tested one of these
models (GPT-3) on a range of cognitive effects, which are systematic patterns
that are usually found in human cognitive tasks. We found that LLMs are indeed
prone to several human cognitive effects. Specifically, we show that the
priming, distance, SNARC, and size congruity effects were presented with GPT-3,
while the anchoring effect is absent. We describe our methodology, and
specifically the way we converted real-world experiments to text-based
experiments. Finally, we speculate on the possible reasons why GPT-3 exhibits
these effects and discuss whether they are imitated or reinvented.
Related papers
- How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - Assessing the nature of large language models: A caution against anthropocentrism [0.0]
We assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures.
Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting.
GPT3.5 did display large variability in both cognitive and personality measures over repeated observations.
arXiv Detail & Related papers (2023-09-14T12:58:30Z) - Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically.
In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs.
Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z) - LLM Cognitive Judgements Differ From Human [0.03626013617212666]
I examine GPT-3 and ChatGPT capabilities on a limited-data inductive reasoning task from the cognitive science literature.
The results suggest that these models' cognitive judgements are not human-like.
arXiv Detail & Related papers (2023-07-20T16:22:36Z) - Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language
Models -- and Disappeared in GPT-4 [0.0]
We show that large language models (LLMs) exhibit behavior that resembles human-like intuition.
We also probe how sturdy the inclination for intuitive-like decision-making is.
arXiv Detail & Related papers (2023-06-13T08:43:13Z) - Do Large Language Models Show Decision Heuristics Similar to Humans? A
Case Study Using GPT-3.5 [0.0]
GPT-3.5 is an example of an LLM that supports a conversational agent called ChatGPT.
In this work, we used a series of novel prompts to determine whether ChatGPT shows biases, and other decision effects.
We also tested the same prompts on human participants.
arXiv Detail & Related papers (2023-05-08T01:02:52Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Evaluating Psychological Safety of Large Language Models [72.88260608425949]
We designed unbiased prompts to evaluate the psychological safety of large language models (LLMs)
We tested five different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI)
Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT, GPT-3.5, and GPT-4 still showed dark personality patterns.
Fine-tuning Llama-2-chat-7B with responses from BFI using direct preference optimization could effectively reduce the psychological toxicity of the model.
arXiv Detail & Related papers (2022-12-20T18:45:07Z) - Thinking Fast and Slow in Large Language Models [0.08057006406834465]
Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life.
In this study, we show that LLMs like GPT-3 exhibit behavior that resembles human-like intuition - and the cognitive errors that come with it.
arXiv Detail & Related papers (2022-12-10T05:07:30Z) - PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
World [86.21137454228848]
We factorize PIGLeT into a physical dynamics model, and a separate language model.
PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation.
It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%.
arXiv Detail & Related papers (2021-06-01T02:32:12Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.