Take the Scenic Route: Improving Generalization in Vision-and-Language
Navigation
- URL: http://arxiv.org/abs/2003.14269v1
- Date: Tue, 31 Mar 2020 14:52:42 GMT
- Title: Take the Scenic Route: Improving Generalization in Vision-and-Language
Navigation
- Authors: Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky
- Abstract summary: We investigate the popular Room-to-Room (R2R) VLN benchmark and discover that what is important is not only the amount of data you synthesize, but also how you do it.
We find that shortest path sampling, which is used by both the R2R benchmark and existing augmentation methods, encode biases in the action space of the agent which we dub as action priors.
We then show that these action priors offer one explanation toward the poor generalization of existing works.
- Score: 44.019674347733506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the Vision-and-Language Navigation (VLN) task, an agent with egocentric
vision navigates to a destination given natural language instructions. The act
of manually annotating these instructions is timely and expensive, such that
many existing approaches automatically generate additional samples to improve
agent performance. However, these approaches still have difficulty generalizing
their performance to new environments. In this work, we investigate the popular
Room-to-Room (R2R) VLN benchmark and discover that what is important is not
only the amount of data you synthesize, but also how you do it. We find that
shortest path sampling, which is used by both the R2R benchmark and existing
augmentation methods, encode biases in the action space of the agent which we
dub as action priors. We then show that these action priors offer one
explanation toward the poor generalization of existing works. To mitigate such
priors, we propose a path sampling method based on random walks to augment the
data. By training with this augmentation strategy, our agent is able to
generalize better to unknown environments compared to the baseline,
significantly improving model performance in the process.
Related papers
- Prioritized Generative Replay [121.83947140497655]
We propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.
This paradigm enables densification of past experience, with new generations that benefit from the generative model's generalization capacity.
We show this recipe can be instantiated using conditional diffusion models and simple relevance functions.
arXiv Detail & Related papers (2024-10-23T17:59:52Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
by Revisiting Oracle Success Routes [25.944819618283613]
Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction.
We make the first attempt to tackle a long-ignored problem in VLN: narrowing the gap between Success Rate (SR) and Oracle Success Rate (OSR)
arXiv Detail & Related papers (2023-08-07T01:43:25Z) - Masked Path Modeling for Vision-and-Language Navigation [41.7517631477082]
Vision-and-language navigation (VLN) agents are trained to navigate in real-world environments by following natural language instructions.
Previous approaches have attempted to address this issue by introducing additional supervision during training.
We introduce a masked path modeling (MPM) objective, which pretrains an agent using self-collected data for downstream navigation tasks.
arXiv Detail & Related papers (2023-05-23T17:20:20Z) - Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
Using Scene Object Spectrum Grounding [16.784045122994506]
We propose a hierarchical navigation method deploying an exploitation policy to correct misled recent actions.
We show that an exploitation policy, which moves the agent toward a well-chosen local goal, outperforms a method which moves the agent to a previously visited state.
We present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects.
arXiv Detail & Related papers (2023-03-07T17:39:53Z) - Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z) - Waypoint Models for Instruction-guided Navigation in Continuous
Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z) - Vision-Language Navigation with Random Environmental Mixup [112.94609558723518]
Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction.
Previous works have proposed various data augmentation methods to reduce data bias.
We propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment.
arXiv Detail & Related papers (2021-06-15T04:34:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.