AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
- URL: http://arxiv.org/abs/2410.19346v1
- Date: Fri, 25 Oct 2024 07:04:16 GMT
- Title: AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
- Authors: Xinyi Mou, Jingcong Liang, Jiayu Lin, Xinnong Zhang, Xiawei Liu, Shiyue Yang, Rong Ye, Lei Chen, Haoyu Kuang, Xuanjing Huang, Zhongyu Wei,
- Abstract summary: We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios.
Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts.
We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning.
- Score: 38.878966229688054
- License:
- Abstract: Large language models (LLMs) are increasingly leveraged to empower autonomous agents to simulate human beings in various fields of behavioral research. However, evaluating their capacity to navigate complex social interactions remains a challenge. Previous studies face limitations due to insufficient scenario diversity, complexity, and a single-perspective focus. To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
Related papers
- I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy [13.68625980741047]
We study interaction patterns of Large Language Model (LLM)-based agents in a context characterized by strict social hierarchy.
We study two types of phenomena: persuasion and anti-social behavior in simulated scenarios involving a guard and a prisoner agent.
arXiv Detail & Related papers (2024-10-09T17:45:47Z) - Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation [16.82101507069166]
Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications.
We examine AI agent ensembles engaged in cross-national collaboration and debate by analyzing their private responses and chat transcripts.
Our findings suggest that multi-agent discussions can support collective AI decisions that more often reflect diverse perspectives.
arXiv Detail & Related papers (2024-05-06T21:20:35Z) - Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)
We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.
We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z) - SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents.
SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels.
We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View [60.80731090755224]
This paper probes the collaboration mechanisms among contemporary NLP systems by practical experiments with theoretical insights.
We fabricate four unique societies' comprised of LLM agents, where each agent is characterized by a specific trait' (easy-going or overconfident) and engages in collaboration with a distinct thinking pattern' (debate or reflection)
Our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring social psychology theories.
arXiv Detail & Related papers (2023-10-03T15:05:52Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.