SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- URL: http://arxiv.org/abs/2310.11667v2
- Date: Fri, 22 Mar 2024 18:52:15 GMT
- Title: SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- Authors: Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap,
- Abstract summary: We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
- Score: 107.4138224020773
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Humans are social beings; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We simulate the role-play interaction between LLM-based agents and humans within this task space and evaluate their performance with a holistic evaluation framework called SOTOPIA-Eval. With SOTOPIA, we find significant differences between these models in terms of their social intelligence, and we identify a subset of SOTOPIA scenarios, SOTOPIA-hard, that is generally challenging for all models. We find that on this subset, GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills. These findings demonstrate SOTOPIA's promise as a general platform for research on evaluating and improving social intelligence in artificial agents.
Related papers
- Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions [67.60397632819202]
Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal.
We identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI.
arXiv Detail & Related papers (2024-04-17T02:57:42Z) - SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents [73.35393511272791]
We propose an interactive learning method, SOTOPIA-$pi$, improving the social intelligence of language agents.
This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings.
arXiv Detail & Related papers (2024-03-13T17:17:48Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - CASPER: Cognitive Architecture for Social Perception and Engagement in
Robots [0.5918643136095765]
We present CASPER: a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior.
We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement.
arXiv Detail & Related papers (2022-09-01T10:15:03Z) - SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement
Learning Agents [23.719833581321033]
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI.
We argue that aiming towards human-level AI requires a broader set of key social skills.
We present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents.
arXiv Detail & Related papers (2021-07-02T10:39:18Z) - PHASE: PHysically-grounded Abstract Social Events for Machine Social
Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions.
Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events.
As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z) - Watch-And-Help: A Challenge for Social Perception and Human-AI
Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents.
In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently.
We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.