Understanding Mental States to Guide Social Influence in Multi-Person Group Dialogue
- URL: http://arxiv.org/abs/2601.13687v2
- Date: Tue, 27 Jan 2026 13:35:01 GMT
- Title: Understanding Mental States to Guide Social Influence in Multi-Person Group Dialogue
- Authors: Zhichao Liang, Satoshi Nakamura,
- Abstract summary: SocialMindChange is a benchmark that moves from tracking minds to changing minds in social interaction.<n>We construct 1,200 social contexts, covering 6000 scenarios and over 90,000 questions, each validated for realism and quality.
- Score: 4.986094627059729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing dynamic Theory of Mind (ToM) benchmarks mostly place language models in a passive role: the model reads a sequence of connected scenarios and reports what people believe, feel, intend, and do as these states change. In real social interaction, ToM is also used for action: a speaker plans what to say in order to shift another person's mental-state trajectory toward a goal. We introduce SocialMindChange, a benchmark that moves from tracking minds to changing minds in social interaction. Each instance defines a social context with 4 characters and five connected scenes. The model plays one character and generates dialogue across the five scenes to reach the target while remaining consistent with the evolving states of all participants. SocialMindChange also includes selected higher-order states. Using a structured four-step framework, we construct 1,200 social contexts, covering 6000 scenarios and over 90,000 questions, each validated for realism and quality. Evaluations on ten state-of-the-art LLMs show that their average performance is 54.2% below human performance. This gap suggests that current LLMs still struggle to maintain and change mental-state representations across long, linked interactions.
Related papers
- Personality Expression Across Contexts: Linguistic and Behavioral Variation in LLM Agents [6.123697959900301]
This study examines how identical personality prompts lead to distinct linguistic, behavioral, and emotional outcomes across four conversational settings.<n>Findings suggest that the same traits are expressed differently depending on social and affective demands.
arXiv Detail & Related papers (2026-02-01T07:14:00Z) - Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies [54.08697738311866]
Social deduction games like Werewolf combine language, reasoning, and strategy.<n>We curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants.<n>We propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages.
arXiv Detail & Related papers (2025-10-13T13:33:30Z) - SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z) - OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction [123.89581506075461]
We propose OmniCharacter, a first seamless speech-language personality interaction model to achieve immersive RPAs with low latency.<n> Specifically, OmniCharacter enables agents to consistently exhibit role-specific personality traits and vocal traits throughout the interaction.<n>Our method yields better responses in terms of both content and style compared to existing RPAs and mainstream speech-language models, with a response latency as low as 289ms.
arXiv Detail & Related papers (2025-05-26T17:55:06Z) - Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z) - ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind [25.524355451378593]
ToMATO is a new ToM benchmark formulated as multiple-choice QA over conversations.<n>We capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge.<n>ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns.
arXiv Detail & Related papers (2025-01-15T14:47:02Z) - Examining Identity Drift in Conversations of LLM Agents [5.12659586713042]
This study examines identity consistency across nine Large Language Models (LLMs)<n>Experiments involve multi-turn conversations on personal themes, analyzed in qualitative and quantitative ways.
arXiv Detail & Related papers (2024-12-01T13:19:32Z) - SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents.
SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels.
We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z) - Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations [20.848802791989307]
We introduce three new challenges to model the fine-grained dynamics between multiple people: speaking target identification, pronoun coreference resolution, and mentioned player prediction.
We propose a novel multimodal baseline that leverages densely aligned language-visual representations by synchronizing visual features with their corresponding utterances.
Experiments demonstrate the effectiveness of the proposed approach with densely aligned multimodal representations in modeling fine-grained social interactions.
arXiv Detail & Related papers (2024-03-04T14:46:58Z) - Learning Triadic Belief Dynamics in Nonverbal Communication from Videos [81.42305032083716]
Nonverbal communication can convey rich social information among agents.
In this paper, we incorporate different nonverbal communication cues to represent, model, learn, and infer agents' mental states.
arXiv Detail & Related papers (2021-04-07T00:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.