Related papers: Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games

Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games

URL: http://arxiv.org/abs/2410.21359v1
Date: Mon, 28 Oct 2024 17:47:41 GMT
Title: Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games
Authors: Ji Ma,
Abstract summary: Large Language Model (LLM)-based agents increasingly undertake real-world tasks and engage with human society. This study investigates how different personas and experimental framings affect these AI agents' altruistic behavior. Despite being trained on extensive human-generated data, these AI agents cannot accurately predict human decisions.
Score: 7.504095239018173
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As Large Language Model (LLM)-based agents increasingly undertake real-world tasks and engage with human society, how well do we understand their behaviors? This study (1) investigates how LLM agents' prosocial behaviors -- a fundamental social norm -- can be induced by different personas and benchmarked against human behaviors; and (2) introduces a behavioral approach to evaluate the performance of LLM agents in complex decision-making scenarios. We explored how different personas and experimental framings affect these AI agents' altruistic behavior in dictator games and compared their behaviors within the same LLM family, across various families, and with human behaviors. Our findings reveal substantial variations and inconsistencies among LLMs and notable differences compared to human behaviors. Merely assigning a human-like identity to LLMs does not produce human-like behaviors. Despite being trained on extensive human-generated data, these AI agents cannot accurately predict human decisions. LLM agents are not able to capture the internal processes of human decision-making, and their alignment with human behavior is highly variable and dependent on specific model architectures and prompt formulations; even worse, such dependence does not follow a clear pattern.

Related papers

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks [12.96670500625407]
We compare human task generation with that of an agent powered by large language models (LLMs)<n>We find that human task generation is consistently influenced by psychological drivers, including personal values and cognitive style.<n>We conclude that there is a core gap between the value-driven, embodied nature of human cognition and the statistical patterns of LLMs.
arXiv Detail & Related papers (2025-08-01T03:00:41Z)
How large language models judge and influence human cooperation [82.07571393247476]
We assess how state-of-the-art language models judge cooperative actions.<n>We observe a remarkable agreement in evaluating cooperation against good opponents.<n>We show that the differences revealed between models can significantly impact the prevalence of cooperation.
arXiv Detail & Related papers (2025-06-30T09:14:42Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation [7.504095239018173]
Large language models (LLMs) increasingly serve as human-like decision-making agents in social science and applied settings. This study proposes and tests methods for probing, quantifying, and modifying an LLM's internal representations in a Dictator Game. Manipulating these vectors during the model's inference can substantially alter how those variables relate to the model's decision-making.
arXiv Detail & Related papers (2025-04-16T00:02:28Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM) We show that current LLMs exhibit a systemic lack of trust in humans. We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina [7.155982875107922]
Studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates or simulations for humans in social science research. We assess the reasoning depth of LLMs using the 11-20 money request game.
arXiv Detail & Related papers (2024-10-25T14:46:07Z)
Investigating Context Effects in Similarity Judgements in Large Language Models [6.421776078858197]
Large Language Models (LLMs) have revolutionised the capability of AI models in comprehending and generating natural language text. We report an ongoing investigation on alignment of LLMs with human judgements affected by order bias.
arXiv Detail & Related papers (2024-08-20T10:26:02Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations [1.6108153271585284]
We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
arXiv Detail & Related papers (2024-03-06T02:23:32Z)
LLM-driven Imitation of Subrational Behavior : Illusion or Reality? [3.2365468114603937]
Existing work highlights the ability of Large Language Models to address complex reasoning tasks and mimic human communication. We propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies. We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios.
arXiv Detail & Related papers (2024-02-13T19:46:39Z)
Can Large Language Model Agents Simulate Human Trust Behavior? [81.45930976132203]
We investigate whether Large Language Model (LLM) agents can simulate human trust behavior. GPT-4 agents manifest high behavioral alignment with humans in terms of trust behavior. We also probe the biases of agent trust and differences in agent trust towards other LLM agents and humans.
arXiv Detail & Related papers (2024-02-07T03:37:19Z)
Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases. These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments. We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z)
Influence of External Information on Large Language Models Mirrors Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others. Recent years have witnessed the rapid development of large language models (LLMs) LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z)
Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors. To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors. MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.