AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
- URL: http://arxiv.org/abs/2410.19346v2
- Date: Sat, 23 Nov 2024 08:23:27 GMT
- Title: AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
- Authors: Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, Zhongyu Wei,
- Abstract summary: We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios.
Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts.
We analyze goals using ERG theory and conduct comprehensive experiments.
Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
- Score: 38.878966229688054
- License:
- Abstract: Large language models (LLMs) are increasingly leveraged to empower autonomous agents to simulate human beings in various fields of behavioral research. However, evaluating their capacity to navigate complex social interactions remains a challenge. Previous studies face limitations due to insufficient scenario diversity, complexity, and a single-perspective focus. To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning. Code and data are available at \url{https://github.com/ljcleo/agent_sense}.
Related papers
- IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems [2.2810745411557316]
We introduce IntellAgent, a scalable, open-source framework to evaluate conversational AI systems.
IntellAgent automates the creation of synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations.
Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.
arXiv Detail & Related papers (2025-01-19T14:58:35Z) - Questioning the Unknown: Optimising Multi-Agent Collaboration in Narrative-Driven Games [18.383262467079078]
We present Questum, a novel framework for Large Language Model (LLM)-based agents in Murder Mystery Games (MMGs)
MMGs pose unique challenges, including undefined state spaces, absent intermediate rewards, and the need for strategic interaction in a continuous language domain.
Questum addresses these complexities through a sensor-based representation of agent states, a question-targeting mechanism guided by information gain, and a pruning strategy to refine suspect lists and enhance decision-making efficiency.
arXiv Detail & Related papers (2024-04-26T19:07:30Z) - SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents.
SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels.
We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z) - Exploring Large Language Model based Intelligent Agents: Definitions,
Methods, and Prospects [32.91556128291915]
This paper surveys current research to provide an in-depth overview of intelligent agents within single and multi-agent systems.
It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback.
We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
arXiv Detail & Related papers (2024-01-07T09:08:24Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Towards Socially Intelligent Agents with Mental State Transition and
Human Utility [97.01430011496576]
We propose to incorporate a mental state and utility model into dialogue agents.
The hybrid mental state extracts information from both the dialogue and event observations.
The utility model is a ranking model that learns human preferences from a crowd-sourced social commonsense dataset.
arXiv Detail & Related papers (2021-03-12T00:06:51Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.