Communicative Learning with Natural Gestures for Embodied Navigation
Agents with Human-in-the-Scene
- URL: http://arxiv.org/abs/2108.02846v1
- Date: Thu, 5 Aug 2021 20:56:47 GMT
- Title: Communicative Learning with Natural Gestures for Embodied Navigation
Agents with Human-in-the-Scene
- Authors: Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo
- Abstract summary: We develop a VR-based 3D simulation environment, named Ges-THOR, based on AI2-THOR platform.
In this virtual environment, a human player is placed in the same virtual scene and shepherds the artificial agent using only gestures.
We argue that learning the semantics of natural gestures is mutually beneficial to learning the navigation task--learn to communicate and communicate to learn.
- Score: 34.1812210095966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-robot collaboration is an essential research topic in artificial
intelligence (AI), enabling researchers to devise cognitive AI systems and
affords an intuitive means for users to interact with the robot. Of note,
communication plays a central role. To date, prior studies in embodied agent
navigation have only demonstrated that human languages facilitate communication
by instructions in natural languages. Nevertheless, a plethora of other forms
of communication is left unexplored. In fact, human communication originated in
gestures and oftentimes is delivered through multimodal cues, e.g. "go there"
with a pointing gesture. To bridge the gap and fill in the missing dimension of
communication in embodied agent navigation, we propose investigating the
effects of using gestures as the communicative interface instead of verbal
cues. Specifically, we develop a VR-based 3D simulation environment, named
Ges-THOR, based on AI2-THOR platform. In this virtual environment, a human
player is placed in the same virtual scene and shepherds the artificial agent
using only gestures. The agent is tasked to solve the navigation problem guided
by natural gestures with unknown semantics; we do not use any predefined
gestures due to the diversity and versatile nature of human gestures. We argue
that learning the semantics of natural gestures is mutually beneficial to
learning the navigation task--learn to communicate and communicate to learn. In
a series of experiments, we demonstrate that human gesture cues, even without
predefined semantics, improve the object-goal navigation for an embodied agent,
outperforming various state-of-the-art methods.
Related papers
- CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction [19.997935470257794]
We present CANVAS, a framework that combines visual and linguistic instructions for commonsense-aware navigation.
Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior.
Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments.
arXiv Detail & Related papers (2024-10-02T06:34:45Z) - SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation [6.1400257928108575]
This research explores acquiring non-verbal communication skills through learning from demonstrations.
In particular, we focus on imitation learning for artificial agents, exemplified by teaching a simulated humanoid American Sign Language.
We use computer vision and deep learning to extract information from videos, and reinforcement learning to enable the agent to replicate observed actions.
arXiv Detail & Related papers (2024-06-14T13:50:29Z) - CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark.
Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities.
We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z) - GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model Agents [35.48323584634582]
We introduce GestureGPT, a free-form hand gesture understanding framework that mimics human gesture understanding procedures.
Our framework leverages multiple Large Language Model agents to manage and synthesize gesture and context information.
We validated our framework offline under two real-world scenarios: smart home control and online video streaming.
arXiv Detail & Related papers (2023-10-19T15:17:34Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - Towards More Human-like AI Communication: A Review of Emergent
Communication Research [0.0]
Emergent communication (Emecom) is a field of research aiming to develop artificial agents capable of using natural language.
In this review, we delineate all the common proprieties we find across the literature and how they relate to human interactions.
We identify two subcategories and highlight their characteristics and open challenges.
arXiv Detail & Related papers (2023-08-01T14:43:10Z) - Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control.
We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.