Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4
- URL: http://arxiv.org/abs/2404.16692v3
- Date: Sat, 16 Nov 2024 16:09:11 GMT
- Title: Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4
- Authors: Lydia Uhler, Verena Jordan, Jürgen Buder, Markus Huff, Frank Papenmeier,
- Abstract summary: This study compares human and GPT-4 problem-solving across both spatial and linguistic tasks.
Four experiments with 588 participants from the U.S. and 680 GPT-4 iterations revealed a stronger tendency towards additive transformations in GPT-4 than in humans.
- Score: 0.0
- License:
- Abstract: Generative artificial intelligences, particularly large language models (LLMs), play an increasingly prominent role in human decision-making contexts, necessitating transparency about their capabilities. While prior studies have shown addition biases in humans (Adams et al., 2021) and OpenAI's GPT-3 (Winter et al., 2023), this study extends the research by comparing human and GPT-4 problem-solving across both spatial and linguistic tasks, with variations in solution efficiency and valence of task instruction. Four preregistered experiments with 588 participants from the U.S. and 680 GPT-4 iterations revealed a stronger tendency towards additive transformations in GPT-4 than in humans. Human participants were less likely to use additive strategies when subtraction was relatively more efficient than when addition and subtraction were equally efficient. GPT-4 exhibited the opposite behavior, with a strong addition bias when subtraction was more efficient. In terms of valence of task instruction, GPT-4's use of additive strategies increased when instructed to "improve" (positive) rather than "edit" (neutral). These findings demonstrate that biases in human problem-solving are amplified in GPT-4, and that LLM behavior differs from human efficiency-based strategies. This highlights the limitations of LLMs and the need for caution when using them in real-world applications.
Related papers
- An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2024-08-31T07:10:16Z) - From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs [12.199629860735195]
We compare GPT4 with supervised models and or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training.
We find that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4.
arXiv Detail & Related papers (2024-08-30T05:50:15Z) - Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs [1.5031024722977635]
GPT-4 successfully replicates 76.0 percent of main effects and 47.0 percent of interaction effects observed in the original studies.
GPT-4's replicated confidence intervals contain the original effect sizes, with the majority of replicated effect sizes exceeding the 95 percent confidence interval of the original studies.
Our results demonstrate the potential of LLMs as powerful tools in psychological research but also emphasize the need for caution in interpreting AI-driven findings.
arXiv Detail & Related papers (2024-08-29T05:18:50Z) - Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [56.70628673595041]
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored.
This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma.
Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases.
arXiv Detail & Related papers (2024-07-05T12:30:02Z) - Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent
Self-Evolution [92.84441068115517]
Investigate-Consolidate-Exploit (ICE) is a novel strategy for enhancing the adaptability and flexibility of AI agents.
ICE promotes the transfer of knowledge between tasks for genuine self-evolution.
Our experiments on the XAgent framework demonstrate ICE's effectiveness, reducing API calls by as much as 80%.
arXiv Detail & Related papers (2024-01-25T07:47:49Z) - How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary
Investigation [90.93999543169296]
GPT-4V acts as the most advanced publicly accessible multimodal foundation model.
This study rigorously evaluates GPT-4V's adaptability and generalization capabilities in dynamic environments.
arXiv Detail & Related papers (2023-12-12T16:48:07Z) - Can large language models provide useful feedback on research papers? A
large-scale empirical analysis [38.905758846360435]
High-quality peer reviews are increasingly difficult to obtain.
With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback.
We created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers.
arXiv Detail & Related papers (2023-10-03T04:14:17Z) - Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically.
In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs.
Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z) - Preference Ranking Optimization for Human Alignment [90.6952059194946]
Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values.
Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment.
We propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to fine-tune LLMs for human alignment.
arXiv Detail & Related papers (2023-06-30T09:07:37Z) - An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2023-05-23T18:17:43Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.