Active Visual Information Gathering for Vision-Language Navigation
- URL: http://arxiv.org/abs/2007.08037v3
- Date: Wed, 19 Aug 2020 19:48:02 GMT
- Title: Active Visual Information Gathering for Vision-Language Navigation
- Authors: Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang and Jianbing Shen
- Abstract summary: Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
- Score: 115.40768457718325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language navigation (VLN) is the task of entailing an agent to carry
out navigational instructions inside photo-realistic environments. One of the
key challenges in VLN is how to conduct a robust navigation by mitigating the
uncertainty caused by ambiguous instructions and insufficient observation of
the environment. Agents trained by current approaches typically suffer from
this and would consequently struggle to avoid random and inefficient actions at
every step. In contrast, when humans face such a challenge, they can still
maintain robust navigation by actively exploring the surroundings to gather
more information and thus make more confident navigation decisions. This work
draws inspiration from human navigation behavior and endows an agent with an
active information gathering ability for a more intelligent vision-language
navigation policy. To achieve this, we propose an end-to-end framework for
learning an exploration policy that decides i) when and where to explore, ii)
what information is worth gathering during exploration, and iii) how to adjust
the navigation decision after the exploration. The experimental results show
promising exploration strategies emerged from training, which leads to
significant boost in navigation performance. On the R2R challenge leaderboard,
our agent gets promising results all three VLN settings, i.e., single run,
pre-exploration, and beam search.
Related papers
- Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation [11.667940255053582]
This paper uses the RGB and depth information of the training scene to pretrain the feature extractor, which improves navigation efficiency.
We evaluated our method on AI2-Thor and RoboTHOR and demonstrated that it significantly outperforms state-of-the-art (SOTA) methods on success rate and navigation efficiency.
arXiv Detail & Related papers (2024-06-20T08:35:10Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments [60.98664330268192]
We present AVLEN -- an interactive agent for Audio-Visual-Language Embodied Navigation.
The goal of AVLEN is to localize an audio event via navigating the 3D visual world.
To realize these abilities, AVLEN uses a multimodal hierarchical reinforcement learning backbone.
arXiv Detail & Related papers (2022-10-14T16:35:06Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.