Watch-And-Help: A Challenge for Social Perception and Human-AI
Collaboration
- URL: http://arxiv.org/abs/2010.09890v2
- Date: Mon, 3 May 2021 13:08:55 GMT
- Title: Watch-And-Help: A Challenge for Social Perception and Human-AI
Collaboration
- Authors: Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao,
Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba
- Abstract summary: We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents.
In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently.
We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
- Score: 116.28433607265573
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we introduce Watch-And-Help (WAH), a challenge for testing
social intelligence in agents. In WAH, an AI agent needs to help a human-like
agent perform a complex household task efficiently. To succeed, the AI agent
needs to i) understand the underlying goal of the task by watching a single
demonstration of the human-like agent performing the same task (social
perception), and ii) coordinate with the human-like agent to solve the task in
an unseen environment as fast as possible (human-AI collaboration). For this
challenge, we build VirtualHome-Social, a multi-agent household environment,
and provide a benchmark including both planning and learning based baselines.
We evaluate the performance of AI agents with the human-like agent as well as
with real humans using objective metrics and subjective user ratings.
Experimental results demonstrate that the proposed challenge and virtual
environment enable a systematic evaluation on the important aspects of machine
social intelligence at scale.
Related papers
- Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge [47.74313897705183]
CHAIC is an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents.
In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints.
We benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages large language models and behavior modeling.
arXiv Detail & Related papers (2024-11-04T04:41:12Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans.
In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals.
We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible.
We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains.
We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z) - Human-AI Collaboration: The Effect of AI Delegation on Human Task
Performance and Task Satisfaction [0.0]
We show that task performance and task satisfaction improve through AI delegation.
We identify humans' increased levels of self-efficacy as the underlying mechanism for these improvements.
Our findings provide initial evidence that allowing AI models to take over more management responsibilities can be an effective form of human-AI collaboration.
arXiv Detail & Related papers (2023-03-16T11:02:46Z) - VECA : A Toolkit for Building Virtual Environments to Train and Test
Human-like Agents [5.366273200529158]
We propose a novel VR-based toolkit, VECA, which enables building fruitful virtual environments to train and test human-like agents.
VECA provides a humanoid agent and an environment manager, enabling the agent to receive rich human-like perception and perform comprehensive interactions.
To motivate VECA, we also provide 24 interactive tasks, which represent (but are not limited to) four essential aspects in early human development.
arXiv Detail & Related papers (2021-05-03T11:42:27Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.