PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
- URL: http://arxiv.org/abs/2502.21017v1
- Date: Fri, 28 Feb 2025 13:04:04 GMT
- Title: PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
- Authors: Fangxu Yu, Lai Jiang, Shenyi Huang, Zhen Wu, Xinyu Dai,
- Abstract summary: The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social interactions.<n>Recent research has emerged to evaluate whether Large Language Models (LLMs) exhibit a form of ToM.<n>We propose PersuasiveToM, a benchmark designed to evaluate the ToM abilities of LLMs in persuasive dialogues.
- Score: 27.231701486961917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social interactions. Recent research has emerged to evaluate whether Large Language Models (LLMs) exhibit a form of ToM. Although recent studies have evaluated ToM in LLMs, existing benchmarks focus predominantly on physical perception with principles guided by the Sally-Anne test in synthetic stories and conversations, failing to capture the complex psychological activities of mental states in real-life social interactions. To mitigate this gap, we propose PersuasiveToM, a benchmark designed to evaluate the ToM abilities of LLMs in persuasive dialogues. Our framework introduces two categories of questions: (1) ToM Reasoning, assessing the capacity of LLMs to track evolving mental states (e.g., desire shifts in persuadees), and (2) ToM Application, evaluating whether LLMs can take advantage of inferred mental states to select effective persuasion strategies (e.g., emphasize rarity) and evaluate the effectiveness of persuasion strategies. Experiments across eight state-of-the-art LLMs reveal that while models excel on multiple questions, they struggle to answer questions that need tracking the dynamics and shifts of mental states and understanding the mental states in the whole dialogue comprehensively. Our aim with PersuasiveToM is to allow an effective evaluation of the ToM reasoning ability of LLMs with more focus on complex psychological activities. Our code is available at https://github.com/Yu-Fangxu/PersuasiveToM.
Related papers
- Theory of Mind in Large Language Models: Assessment and Enhancement [14.41464477095448]
Large Language Models (LLMs) become increasingly integrated into daily life.
It is crucial to assess and enhance their capacity to interpret and respond to human mental states.
arXiv Detail & Related papers (2025-04-26T10:17:48Z) - Re-evaluating Theory of Mind evaluation in large language models [3.262532929657758]
We take inspiration from cognitive science to re-evaluate the state of ToM evaluation in large language models.<n>A major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviors.<n>We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication.
arXiv Detail & Related papers (2025-02-28T14:36:57Z) - A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks [0.0]
This systematic review synthesizes current efforts to assess large language models' (LLMs) ability to perform ToM tasks.<n>A recurring theme in the literature reveals that while LLMs demonstrate emerging competence in ToM tasks, significant gaps persist in their emulation of human cognitive abilities.
arXiv Detail & Related papers (2025-02-12T21:19:30Z) - Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models [51.91448005607405]
We evaluate key human ToM precursors by annotating characters' perceptions on ToMi and FANToM.
We present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference.
arXiv Detail & Related papers (2024-07-08T14:58:29Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses [11.121931601655174]
Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts.
Large language models (LLMs) excel in tasks such as summarization, question answering, and translation.
Despite advancements, the extent to which LLMs truly understand ToM reasoning remains inadequately explored in open-ended scenarios.
arXiv Detail & Related papers (2024-06-09T05:57:59Z) - ToMBench: Benchmarking Theory of Mind in Large Language Models [41.565202027904476]
ToM is the cognitive capability to perceive and ascribe mental states to oneself and others.<n>Existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination.<n>We introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage.
arXiv Detail & Related papers (2024-02-23T02:05:46Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - FANToM: A Benchmark for Stress-testing Machine Theory of Mind in
Interactions [94.61530480991627]
Theory of mind evaluations currently focus on testing models using passive narratives that inherently lack interactivity.
We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering.
arXiv Detail & Related papers (2023-10-24T00:24:11Z) - Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in
Large Language Models [82.50173296858377]
Many anecdotal examples were used to suggest newer large language models (LLMs) like ChatGPT and GPT-4 exhibit Neural Theory-of-Mind (N-ToM)
We investigate the extent of LLMs' N-ToM through an extensive evaluation on 6 tasks and find that while LLMs exhibit certain N-ToM abilities, this behavior is far from being robust.
arXiv Detail & Related papers (2023-05-24T06:14:31Z) - Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box.
We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.