Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games
- URL: http://arxiv.org/abs/2505.24255v1
- Date: Fri, 30 May 2025 06:23:52 GMT
- Title: Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games
- Authors: Neemesh Yadav, Palakorn Achananuparp, Jing Jiang, Ee-Peng Lim,
- Abstract summary: We investigate the role of theory-of-mind (ToM) reasoning in aligning agentic behaviors with human norms in negotiation tasks.<n>ToM reasoning enhances behavior alignment, decision-making consistency, and negotiation outcomes.<n>Our findings contribute to the understanding of ToM's role in enhancing human-AI interaction and cooperative decision-making.
- Score: 17.615991993062455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown potential in simulating human behaviors and performing theory-of-mind (ToM) reasoning, a crucial skill for complex social interactions. In this study, we investigate the role of ToM reasoning in aligning agentic behaviors with human norms in negotiation tasks, using the ultimatum game as a controlled environment. We initialized LLM agents with different prosocial beliefs (including Greedy, Fair, and Selfless) and reasoning methods like chain-of-thought (CoT) and varying ToM levels, and examined their decision-making processes across diverse LLMs, including reasoning models like o3-mini and DeepSeek-R1 Distilled Qwen 32B. Results from 2,700 simulations indicated that ToM reasoning enhances behavior alignment, decision-making consistency, and negotiation outcomes. Consistent with previous findings, reasoning models exhibit limited capability compared to models with ToM reasoning, different roles of the game benefits with different orders of ToM reasoning. Our findings contribute to the understanding of ToM's role in enhancing human-AI interaction and cooperative decision-making. The code used for our experiments can be found at https://github.com/Stealth-py/UltimatumToM.
Related papers
- Re-evaluating Theory of Mind evaluation in large language models [3.262532929657758]
We take inspiration from cognitive science to re-evaluate the state of ToM evaluation in large language models.<n>A major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviors.<n>We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication.
arXiv Detail & Related papers (2025-02-28T14:36:57Z) - Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.<n>Our algorithm is modeled after the Bayesian theory-of-mind framework.<n>We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection [31.38516078163367]
ToM-agent is designed to empower LLMs-based generative agents to simulate ToM in open-domain conversational interactions.<n>ToM-agent disentangles the confidence from mental states, facilitating the emulation of an agent's perception of its counterpart's mental states.<n>Our findings indicate that the ToM-agent can grasp the underlying reasons for their counterpart's behaviors beyond mere semantic-emotional supporting or decision-making based on common sense.
arXiv Detail & Related papers (2025-01-26T00:32:38Z) - Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models [51.91448005607405]
We evaluate key human ToM precursors by annotating characters' perceptions on ToMi and FANToM.
We present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference.
arXiv Detail & Related papers (2024-07-08T14:58:29Z) - Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models [52.894048516550065]
We develop a pipeline for multimodal ToM reasoning using video and text.
We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question.
arXiv Detail & Related papers (2024-06-19T18:24:31Z) - Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses [11.121931601655174]
Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts.
Large language models (LLMs) excel in tasks such as summarization, question answering, and translation.
Despite advancements, the extent to which LLMs truly understand ToM reasoning remains inadequately explored in open-ended scenarios.
arXiv Detail & Related papers (2024-06-09T05:57:59Z) - PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models [25.657579792829743]
We empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities.
We propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences.
arXiv Detail & Related papers (2024-03-04T17:34:34Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning
in Large Language Models [31.831042765744204]
Theory of Mind (ToM) is the ability to reason about one's own and others' mental states.
We introduce HI-TOM, a Higher Order Theory of Mind benchmark.
Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks.
arXiv Detail & Related papers (2023-10-25T16:41:15Z) - FANToM: A Benchmark for Stress-testing Machine Theory of Mind in
Interactions [94.61530480991627]
Theory of mind evaluations currently focus on testing models using passive narratives that inherently lack interactivity.
We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering.
arXiv Detail & Related papers (2023-10-24T00:24:11Z) - Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in
Large Language Models [82.50173296858377]
Many anecdotal examples were used to suggest newer large language models (LLMs) like ChatGPT and GPT-4 exhibit Neural Theory-of-Mind (N-ToM)
We investigate the extent of LLMs' N-ToM through an extensive evaluation on 6 tasks and find that while LLMs exhibit certain N-ToM abilities, this behavior is far from being robust.
arXiv Detail & Related papers (2023-05-24T06:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.