Communicative Learning with Natural Gestures for Embodied Navigation
Agents with Human-in-the-Scene
- URL: http://arxiv.org/abs/2108.02846v1
- Date: Thu, 5 Aug 2021 20:56:47 GMT
- Title: Communicative Learning with Natural Gestures for Embodied Navigation
Agents with Human-in-the-Scene
- Authors: Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo
- Abstract summary: We develop a VR-based 3D simulation environment, named Ges-THOR, based on AI2-THOR platform.
In this virtual environment, a human player is placed in the same virtual scene and shepherds the artificial agent using only gestures.
We argue that learning the semantics of natural gestures is mutually beneficial to learning the navigation task--learn to communicate and communicate to learn.
- Score: 34.1812210095966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-robot collaboration is an essential research topic in artificial
intelligence (AI), enabling researchers to devise cognitive AI systems and
affords an intuitive means for users to interact with the robot. Of note,
communication plays a central role. To date, prior studies in embodied agent
navigation have only demonstrated that human languages facilitate communication
by instructions in natural languages. Nevertheless, a plethora of other forms
of communication is left unexplored. In fact, human communication originated in
gestures and oftentimes is delivered through multimodal cues, e.g. "go there"
with a pointing gesture. To bridge the gap and fill in the missing dimension of
communication in embodied agent navigation, we propose investigating the
effects of using gestures as the communicative interface instead of verbal
cues. Specifically, we develop a VR-based 3D simulation environment, named
Ges-THOR, based on AI2-THOR platform. In this virtual environment, a human
player is placed in the same virtual scene and shepherds the artificial agent
using only gestures. The agent is tasked to solve the navigation problem guided
by natural gestures with unknown semantics; we do not use any predefined
gestures due to the diversity and versatile nature of human gestures. We argue
that learning the semantics of natural gestures is mutually beneficial to
learning the navigation task--learn to communicate and communicate to learn. In
a series of experiments, we demonstrate that human gesture cues, even without
predefined semantics, improve the object-goal navigation for an embodied agent,
outperforming various state-of-the-art methods.
Related papers
- Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer [21.70275919660522]
This study explores the application of natural gesture recognition based on computer vision in human-computer interaction.
By connecting the palm and each finger joint, a dynamic and static gesture model of the hand is formed.
Experimental results show that this method can effectively recognize various gestures and maintain high recognition accuracy and real-time response capabilities.
arXiv Detail & Related papers (2024-12-24T10:13:20Z) - doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation [0.0]
doScenes is a novel dataset designed to facilitate research on human-vehicle instruction interactions.
DoScenes bridges the gap between instruction and driving response, enabling context-aware and adaptive planning.
arXiv Detail & Related papers (2024-12-08T11:16:47Z) - SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark.
Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities.
We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z) - Maia: A Real-time Non-Verbal Chat for Human-AI Interaction [10.580858171606167]
We propose an alternative to text-based Human-AI interaction.
By leveraging nonverbal visual communication, through facial expressions, head and body movements, we aim to enhance engagement.
Our approach is not art-specific and can be adapted to various paintings, animations, and avatars.
arXiv Detail & Related papers (2024-02-09T13:07:22Z) - GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model Agents [35.48323584634582]
We introduce GestureGPT, a free-form hand gesture understanding framework that mimics human gesture understanding procedures.
Our framework leverages multiple Large Language Model agents to manage and synthesize gesture and context information.
We validated our framework offline under two real-world scenarios: smart home control and online video streaming.
arXiv Detail & Related papers (2023-10-19T15:17:34Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control.
We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans.
Our approach is enabled by our novel data-generation tool, HumANav.
We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.