Related papers: EgoSocialArena: Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

EgoSocialArena: Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

URL: http://arxiv.org/abs/2410.06195v3
Date: Mon, 24 Feb 2025 02:22:39 GMT
Title: EgoSocialArena: Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective
Authors: Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, Weiming Lu,
Abstract summary: Social intelligence is built upon three pillars: cognitive intelligence, situational intelligence, and behavioral intelligence.<n>EgoSocialArena aims to systematically evaluate the social intelligence of large language models from a first-person perspective.
Score: 22.30892836263764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social intelligence is built upon three foundational pillars: cognitive intelligence, situational intelligence, and behavioral intelligence. As large language models (LLMs) become increasingly integrated into our social lives, understanding, evaluating, and developing their social intelligence are becoming increasingly important. While multiple existing works have investigated the social intelligence of LLMs, (1) most focus on a specific aspect, and the social intelligence of LLMs has yet to be systematically organized and studied; (2) position LLMs as passive observers from a third-person perspective, such as in Theory of Mind (ToM) tests. Compared to the third-person perspective, ego-centric first-person perspective evaluation can align well with actual LLM-based Agent use scenarios. (3) a lack of comprehensive evaluation of behavioral intelligence, with specific emphasis on incorporating critical human-machine interaction scenarios. In light of this, we present EgoSocialArena, a novel framework grounded in the three pillars of social intelligence: cognitive, situational, and behavioral intelligence, aimed to systematically evaluate the social intelligence of LLMs from a first-person perspective. With EgoSocialArena, we conduct a comprehensive evaluation of eight prominent foundation models, even the most advanced LLMs like O1-preview lag behind human performance.

Related papers

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs [33.35844258541633]
EgoExoBench is the first benchmark for egocentric-exocentric video understanding and reasoning.<n>It comprises over 7,300 question-answer pairs spanning eleven sub-tasks organized into three core challenges: semantic alignment, viewpoint association, and temporal reasoning.<n>We evaluate 13 state-of-the-art MLLMs and find that while these models excel on single-view tasks, they struggle to align semantics across perspectives, accurately associate views, and infer temporal dynamics in the ego-exo context.
arXiv Detail & Related papers (2025-07-24T12:14:49Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence [62.21106561772784]
We introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing Large Language Models' social intelligence.<n> Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method.<n>It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3.
arXiv Detail & Related papers (2025-05-30T12:01:06Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory [8.80864059602965]
Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security.
arXiv Detail & Related papers (2024-06-20T14:42:58Z)
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context [27.740204336800687]
Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. We develop a novel framework, InterIntent, to assess LLMs' social intelligence by mapping their ability to understand and manage intentions in a game setting.
arXiv Detail & Related papers (2024-06-18T02:02:15Z)
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View [21.341128731357415]
Large language models (LLMs) have been shown to face hallucination issues due to the data they trained on often containing human bias. We propose CogMir, an open-ended Multi-LLM Agents framework that utilizes hallucination properties to assess and enhance LLM Agents' social intelligence.
arXiv Detail & Related papers (2024-05-23T16:13:33Z)
LLM Theory of Mind and Alignment: Opportunities and Risks [0.0]
There is growing interest in whether large language models (LLMs) have theory of mind (ToM) This paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels. It lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.
arXiv Detail & Related papers (2024-05-13T19:52:16Z)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z)
Academically intelligent LLMs are not necessarily socially intelligent [56.452845189961444]
The academic intelligence of large language models (LLMs) has made remarkable progress in recent times, but their social intelligence performance remains unclear. Inspired by established human social intelligence frameworks, we have developed a standardized social intelligence test based on real-world social scenarios.
arXiv Detail & Related papers (2024-03-11T10:35:53Z)
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs [24.613282867543244]
Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
arXiv Detail & Related papers (2024-03-08T03:49:17Z)
Do LLM Agents Exhibit Social Behavior? [5.094340963261968]
State-Understanding-Value-Action (SUVA) is a framework to systematically analyze responses in social contexts. It assesses social behavior through both their final decisions and the response generation processes leading to those decisions. We demonstrate that utterance-based reasoning reliably predicts LLMs' final actions.
arXiv Detail & Related papers (2023-12-23T08:46:53Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Influence of External Information on Large Language Models Mirrors Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others. Recent years have witnessed the rapid development of large language models (LLMs) LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z)
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box. We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
Social Neuro AI: Social Interaction as the "dark matter" of AI [0.0]
We argue that empirical results from social psychology and social neuroscience along with the framework of dynamics can be of inspiration to the development of more intelligent artificial agents.
arXiv Detail & Related papers (2021-12-31T13:41:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.