Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships
- URL: http://arxiv.org/abs/2005.02153v1
- Date: Wed, 29 Apr 2020 08:46:38 GMT
- Title: Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships
- Authors: Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, and Heng Tao Shen
- Abstract summary: We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
- Score: 52.72020203771489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied artificial intelligence (AI) tasks shift from tasks focusing on
internet images to active settings involving embodied agents that perceive and
act within 3D environments. In this paper, we investigate the target-driven
visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes,
whose navigation task aims to train an agent that can intelligently make a
series of decisions to arrive at a pre-specified target location from any
possible starting positions only based on egocentric views. However, most
navigation methods currently struggle against several challenging problems,
such as data efficiency, automatic obstacle avoidance, and generalization.
Generalization problem means that agent does not have the ability to transfer
navigation skills learned from previous experience to unseen targets and
scenes. To address these issues, we incorporate two designs into classic DRL
framework: attention on 3D knowledge graph (KG) and target skill extension
(TSE) module. On the one hand, our proposed method combines visual features and
3D spatial representations to learn navigation policy. On the other hand, TSE
module is used to generate sub-targets which allow agent to learn from
failures. Specifically, our 3D spatial relationships are encoded through
recently popular graph convolutional network (GCN). Considering the real world
settings, our work also considers open action and adds actionable targets into
conventional navigation situations. Those more difficult settings are applied
to test whether DRL agent really understand its task, navigating environment,
and can carry out reasoning. Our experiments, performed in the AI2-THOR, show
that our model outperforms the baselines in both SR and SPL metrics, and
improves generalization ability across targets and scenes.
Related papers
- Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - 3D-Aware Object Goal Navigation via Simultaneous Exploration and
Identification [19.125633699422117]
We propose a framework for 3D-aware ObjectNav based on two straightforward sub-policies.
Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets.
arXiv Detail & Related papers (2022-12-01T07:55:56Z) - Towards self-attention based visual navigation in the real world [0.0]
Vision guided navigation requires processing complex visual information to inform task-orientated decisions.
Deep Reinforcement Learning agents trained in simulation often exhibit unsatisfactory results when deployed in the real-world.
This is the first demonstration of a self-attention based agent successfully trained in navigating a 3D action space, using less than 4000 parameters.
arXiv Detail & Related papers (2022-09-15T04:51:42Z) - Zero Experience Required: Plug & Play Modular Transfer Learning for
Semantic Visual Navigation [97.17517060585875]
We present a unified approach to visual navigation using a novel modular transfer learning model.
Our model can effectively leverage its experience from one source task and apply it to multiple target tasks.
Our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
arXiv Detail & Related papers (2022-02-05T00:07:21Z) - SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency [122.18108118190334]
We present a framework called Self- Embodied Embodied Active Learning (SEAL)
It utilizes perception models trained on internet images to learn an active exploration policy.
We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
arXiv Detail & Related papers (2021-12-02T06:26:38Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - MaAST: Map Attention with Semantic Transformersfor Efficient Visual
Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents.
We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation.
We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.