Related papers: SocialEval: Evaluating Social Intelligence of Large Language Models

SocialEval: Evaluating Social Intelligence of Large Language Models

URL: http://arxiv.org/abs/2506.00900v1
Date: Sun, 01 Jun 2025 08:36:51 GMT
Title: SocialEval: Evaluating Social Intelligence of Large Language Models
Authors: Jinfeng Zhou, Yuxuan Chen, Yihan Shi, Xuanming Zhang, Leqi Lei, Yi Feng, Zexuan Xiong, Miao Yan, Xunzhi Wang, Yaru Cao, Jianing Yin, Shuai Wang, Quanyu Dai, Zhenhua Dong, Hongning Wang, Minlie Huang,
Abstract summary: Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
Score: 70.90981021629021
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs exhibit promising Social Intelligence (SI) in modeling human behavior, raising the need to evaluate LLMs' SI and their discrepancy with humans. SI equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals. This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation, which existing work fails to address. To this end, we propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts. Each script is structured as a world tree that contains plot lines driven by interpersonal ability, providing a comprehensive view of how LLMs navigate social interactions. Experiments show that LLMs fall behind humans on both SI evaluations, exhibit prosociality, and prefer more positive social behaviors, even if they lead to goal failure. Analysis of LLMs' formed representation space and neuronal activations reveals that LLMs have developed ability-specific functional partitions akin to the human brain.

Related papers

Neural Synchrony Between Socially Interacting Language Models [52.74586779814636]
Large language models (LLMs) are widely accepted as powerful approximations of human behavior.<n>It remains controversial whether they can be meaningfully compared to human social minds.
arXiv Detail & Related papers (2026-02-19T20:33:54Z)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z)
Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z)
Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMs [4.352318127577628]
This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI)<n>This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations.
arXiv Detail & Related papers (2025-02-12T11:30:28Z)
EgoSocialArena: Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective [22.30892836263764]
Social intelligence is built upon three pillars: cognitive intelligence, situational intelligence, and behavioral intelligence.<n>EgoSocialArena aims to systematically evaluate the social intelligence of large language models from a first-person perspective.
arXiv Detail & Related papers (2024-10-08T16:55:51Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context [27.740204336800687]
Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. We develop a novel framework, InterIntent, to assess LLMs' social intelligence by mapping their ability to understand and manage intentions in a game setting.
arXiv Detail & Related papers (2024-06-18T02:02:15Z)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z)
Do LLM Agents Exhibit Social Behavior? [5.094340963261968]
State-Understanding-Value-Action (SUVA) is a framework to systematically analyze responses in social contexts. It assesses social behavior through both their final decisions and the response generation processes leading to those decisions. We demonstrate that utterance-based reasoning reliably predicts LLMs' final actions.
arXiv Detail & Related papers (2023-12-23T08:46:53Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.