Related papers: PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues

PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues

URL: http://arxiv.org/abs/2502.21017v1
Date: Fri, 28 Feb 2025 13:04:04 GMT
Title: PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
Authors: Fangxu Yu, Lai Jiang, Shenyi Huang, Zhen Wu, Xinyu Dai,
Abstract summary: The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social interactions.<n>Recent research has emerged to evaluate whether Large Language Models (LLMs) exhibit a form of ToM.<n>We propose PersuasiveToM, a benchmark designed to evaluate the ToM abilities of LLMs in persuasive dialogues.
Score: 27.231701486961917
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social interactions. Recent research has emerged to evaluate whether Large Language Models (LLMs) exhibit a form of ToM. Although recent studies have evaluated ToM in LLMs, existing benchmarks focus predominantly on physical perception with principles guided by the Sally-Anne test in synthetic stories and conversations, failing to capture the complex psychological activities of mental states in real-life social interactions. To mitigate this gap, we propose PersuasiveToM, a benchmark designed to evaluate the ToM abilities of LLMs in persuasive dialogues. Our framework introduces two categories of questions: (1) ToM Reasoning, assessing the capacity of LLMs to track evolving mental states (e.g., desire shifts in persuadees), and (2) ToM Application, evaluating whether LLMs can take advantage of inferred mental states to select effective persuasion strategies (e.g., emphasize rarity) and evaluate the effectiveness of persuasion strategies. Experiments across eight state-of-the-art LLMs reveal that while models excel on multiple questions, they struggle to answer questions that need tracking the dynamics and shifts of mental states and understanding the mental states in the whole dialogue comprehensively. Our aim with PersuasiveToM is to allow an effective evaluation of the ToM reasoning ability of LLMs with more focus on complex psychological activities. Our code is available at https://github.com/Yu-Fangxu/PersuasiveToM.

Related papers

XToM: Exploring the Multilingual Theory of Mind for Large Language Models [57.9821865189077]
Existing evaluations of Theory of Mind in LLMs are largely limited to English.<n>We present XToM, a rigorously validated multilingual benchmark that evaluates ToM across five languages.<n>Our findings expose limitations in LLMs' ability to replicate human-like mentalizing across linguistic contexts.
arXiv Detail & Related papers (2025-06-03T05:23:25Z)
Theory of Mind in Large Language Models: Assessment and Enhancement [14.41464477095448]
Large Language Models (LLMs) become increasingly integrated into daily life. It is crucial to assess and enhance their capacity to interpret and respond to human mental states.
arXiv Detail & Related papers (2025-04-26T10:17:48Z)
Re-evaluating Theory of Mind evaluation in large language models [3.262532929657758]
We take inspiration from cognitive science to re-evaluate the state of ToM evaluation in large language models.<n>A major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviors.<n>We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication.
arXiv Detail & Related papers (2025-02-28T14:36:57Z)
A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks [0.0]
This systematic review synthesizes current efforts to assess large language models' (LLMs) ability to perform ToM tasks.<n>A recurring theme in the literature reveals that while LLMs demonstrate emerging competence in ToM tasks, significant gaps persist in their emulation of human cognitive abilities.
arXiv Detail & Related papers (2025-02-12T21:19:30Z)
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning [13.545981051703682]
Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation.<n>We identify several lines of work in different communities in AI, including LLM benchmarking, ToM add-ons, ToM probing, and formal models for ToM.<n>We conclude with suggestions for improved evaluation of ToM capabilities inspired by dynamic environments used in cognitive tasks.
arXiv Detail & Related papers (2024-12-18T09:06:48Z)
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models [51.91448005607405]
We evaluate key human ToM precursors by annotating characters' perceptions on ToMi and FANToM. We present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference.
arXiv Detail & Related papers (2024-07-08T14:58:29Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses [11.121931601655174]
Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts. Large language models (LLMs) excel in tasks such as summarization, question answering, and translation. Despite advancements, the extent to which LLMs truly understand ToM reasoning remains inadequately explored in open-ended scenarios.
arXiv Detail & Related papers (2024-06-09T05:57:59Z)
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding [55.38254464415964]
Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states.
arXiv Detail & Related papers (2024-04-21T11:51:13Z)
ToMBench: Benchmarking Theory of Mind in Large Language Models [41.565202027904476]
ToM is the cognitive capability to perceive and ascribe mental states to oneself and others.<n>Existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination.<n>We introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage.
arXiv Detail & Related papers (2024-02-23T02:05:46Z)
Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking. Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z)
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions [94.61530480991627]
Theory of mind evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering.
arXiv Detail & Related papers (2023-10-24T00:24:11Z)
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind [3.9599054392856483]
We present ToMChallenges, a dataset for comprehensively evaluating the Theory of Mind based on the Sally-Anne and Smarties tests with a diverse set of tasks. Our evaluation results and error analyses show that LLMs have inconsistent behaviors across prompts and tasks.
arXiv Detail & Related papers (2023-05-24T11:54:07Z)
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models [82.50173296858377]
Many anecdotal examples were used to suggest newer large language models (LLMs) like ChatGPT and GPT-4 exhibit Neural Theory-of-Mind (N-ToM) We investigate the extent of LLMs' N-ToM through an extensive evaluation on 6 tasks and find that while LLMs exhibit certain N-ToM abilities, this behavior is far from being robust.
arXiv Detail & Related papers (2023-05-24T06:14:31Z)
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box. We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.