Related papers: Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective

Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective

URL: http://arxiv.org/abs/2410.06195v1
Date: Tue, 8 Oct 2024 16:55:51 GMT
Title: Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective
Authors: Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, Weiming Lu,
Abstract summary: In the era of artificial intelligence (AI), especially with the development of large language models (LLMs), we raise an intriguing question. How do LLMs perform in terms of ToM and socialization capabilities? We introduce EgoSocialArena, a novel framework designed to evaluate and investigate the ToM and socialization capabilities of LLMs from a first person perspective.
Score: 22.30892836263764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the social world, humans possess the capability to infer and reason about others mental states (such as emotions, beliefs, and intentions), known as the Theory of Mind (ToM). Simultaneously, humans own mental states evolve in response to social situations, a capability we refer to as socialization. Together, these capabilities form the foundation of human social interaction. In the era of artificial intelligence (AI), especially with the development of large language models (LLMs), we raise an intriguing question: How do LLMs perform in terms of ToM and socialization capabilities? And more broadly, can these AI models truly enter and navigate the real social world? Existing research evaluating LLMs ToM and socialization capabilities by positioning LLMs as passive observers from a third person perspective, rather than as active participants. However, compared to the third-person perspective, observing and understanding the world from an egocentric first person perspective is a natural approach for both humans and AI agents. The ToM and socialization capabilities of LLMs from a first person perspective, a crucial attribute for advancing embodied AI agents, remain unexplored. To answer the aforementioned questions and bridge the research gap, we introduce EgoSocialArena, a novel framework designed to evaluate and investigate the ToM and socialization capabilities of LLMs from a first person perspective. It encompasses two evaluation environments: static environment and interactive environment, with seven scenarios: Daily Life, Counterfactual, New World, Blackjack, Number Guessing, and Limit Texas Hold em, totaling 2,195 data entries. With EgoSocialArena, we have conducted a comprehensive evaluation of nine advanced LLMs and observed some key insights regarding the future development of LLMs as well as the capabilities levels of the most advanced LLMs currently available.

Related papers

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs [33.35844258541633]
EgoExoBench is the first benchmark for egocentric-exocentric video understanding and reasoning.<n>It comprises over 7,300 question-answer pairs spanning eleven sub-tasks organized into three core challenges: semantic alignment, viewpoint association, and temporal reasoning.<n>We evaluate 13 state-of-the-art MLLMs and find that while these models excel on single-view tasks, they struggle to align semantics across perspectives, accurately associate views, and infer temporal dynamics in the ego-exo context.
arXiv Detail & Related papers (2025-07-24T12:14:49Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence [62.21106561772784]
We introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing Large Language Models' social intelligence.<n> Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method.<n>It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3.
arXiv Detail & Related papers (2025-05-30T12:01:06Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory [8.80864059602965]
Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security.
arXiv Detail & Related papers (2024-06-20T14:42:58Z)
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context [27.740204336800687]
Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. We develop a novel framework, InterIntent, to assess LLMs' social intelligence by mapping their ability to understand and manage intentions in a game setting.
arXiv Detail & Related papers (2024-06-18T02:02:15Z)
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View [21.341128731357415]
Large language models (LLMs) have been shown to face hallucination issues due to the data they trained on often containing human bias. We propose CogMir, an open-ended Multi-LLM Agents framework that utilizes hallucination properties to assess and enhance LLM Agents' social intelligence.
arXiv Detail & Related papers (2024-05-23T16:13:33Z)
LLM Theory of Mind and Alignment: Opportunities and Risks [0.0]
There is growing interest in whether large language models (LLMs) have theory of mind (ToM) This paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels. It lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.
arXiv Detail & Related papers (2024-05-13T19:52:16Z)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z)
Academically intelligent LLMs are not necessarily socially intelligent [56.452845189961444]
The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear. Inspired by established human social intelligence frameworks, we have developed a standardized social intelligence test based on real-world social scenarios.
arXiv Detail & Related papers (2024-03-11T10:35:53Z)
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs [24.613282867543244]
Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
arXiv Detail & Related papers (2024-03-08T03:49:17Z)
Do LLM Agents Exhibit Social Behavior? [5.094340963261968]
State-Understanding-Value-Action (SUVA) is a framework to systematically analyze responses in social contexts. It assesses social behavior through both their final decisions and the response generation processes leading to those decisions. We demonstrate that utterance-based reasoning reliably predicts LLMs' final actions.
arXiv Detail & Related papers (2023-12-23T08:46:53Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Influence of External Information on Large Language Models Mirrors Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others. Recent years have witnessed the rapid development of large language models (LLMs) LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z)
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box. We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
Social Neuro AI: Social Interaction as the "dark matter" of AI [0.0]
We argue that empirical results from social psychology and social neuroscience along with the framework of dynamics can be of inspiration to the development of more intelligent artificial agents.
arXiv Detail & Related papers (2021-12-31T13:41:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.