Related papers: AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios

AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios

URL: http://arxiv.org/abs/2410.19346v2
Date: Sat, 23 Nov 2024 08:23:27 GMT
Title: AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
Authors: Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, Zhongyu Wei,
Abstract summary: We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
Score: 38.878966229688054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly leveraged to empower autonomous agents to simulate human beings in various fields of behavioral research. However, evaluating their capacity to navigate complex social interactions remains a challenge. Previous studies face limitations due to insufficient scenario diversity, complexity, and a single-perspective focus. To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning. Code and data are available at \url{https://github.com/ljcleo/agent_sense}.

Related papers

Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems [0.0]
We introduce Trans-ACT, an approach embedding Transactional Analysis (TA) principles into multi-agent systems.<n>Trans-ACT integrates the Parent, Adult, and Child ego states into an agent's cognitive architecture.<n>Our experimental simulation, which reproduces the Stupid game scenario, demonstrates that agents grounded in cognitive and TA principles produce deeper and context-aware interactions.
arXiv Detail & Related papers (2025-07-28T21:46:21Z)
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions [4.819825467587802]
We present a novel benchmark, LIFELONG-SOTOPIA, to perform a comprehensive evaluation of language agents.<n>We find that goal achievement and believability of all of the language models that we test decline through the whole interaction.<n>These findings show that we can use LIFELONG-SOTOPIA to evaluate the social intelligence of language agents over lifelong social interactions.
arXiv Detail & Related papers (2025-06-14T23:57:54Z)
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations [0.0]
We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games.<n>We develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality.<n>Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry.
arXiv Detail & Related papers (2025-05-19T10:01:35Z)
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z)
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions [12.218102495632937]
Large language models (LLMs) demonstrate strong potential as agents for tool invocation due to their advanced comprehension and planning capabilities. We propose the Multi-Mission Tool Bench. In the benchmark, each test case comprises multiple interrelated missions. We also propose a novel method to evaluate the accuracy and efficiency of agent decisions with dynamic decision trees.
arXiv Detail & Related papers (2025-04-03T14:21:33Z)
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems [2.2810745411557316]
We introduce IntellAgent, a scalable, open-source framework to evaluate conversational AI systems. IntellAgent automates the creation of synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.
arXiv Detail & Related papers (2025-01-19T14:58:35Z)
Exploring Autonomous Agents through the Lens of Large Language Models: A Review [0.0]
Large Language Models (LLMs) are transforming artificial intelligence, enabling autonomous agents to perform diverse tasks across various domains. They face challenges such as multimodality, human value alignment, hallucinations, and evaluation. Evaluation platforms like AgentBench, WebArena, and ToolLLM provide robust methods for assessing these agents in complex scenarios.
arXiv Detail & Related papers (2024-04-05T22:59:02Z)
SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents. SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels. We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z)
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects [32.91556128291915]
This paper surveys current research to provide an in-depth overview of intelligent agents within single and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
arXiv Detail & Related papers (2024-01-07T09:08:24Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents [23.719833581321033]
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. We argue that aiming towards human-level AI requires a broader set of key social skills. We present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents.
arXiv Detail & Related papers (2021-07-02T10:39:18Z)
Towards Socially Intelligent Agents with Mental State Transition and Human Utility [97.01430011496576]
We propose to incorporate a mental state and utility model into dialogue agents. The hybrid mental state extracts information from both the dialogue and event observations. The utility model is a ranking model that learns human preferences from a crowd-sourced social commonsense dataset.
arXiv Detail & Related papers (2021-03-12T00:06:51Z)
Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion. The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element. Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.