EnvEdit: Environment Editing for Vision-and-Language Navigation
- URL: http://arxiv.org/abs/2203.15685v1
- Date: Tue, 29 Mar 2022 15:44:32 GMT
- Title: EnvEdit: Environment Editing for Vision-and-Language Navigation
- Authors: Jialu Li, Hao Tan, Mohit Bansal
- Abstract summary: In Vision-and-Language Navigation (VLN), an agent needs to navigate through the environment based on natural language instructions.
We propose EnvEdit, a data augmentation method that creates new environments by editing existing environments.
We show that our proposed EnvEdit method gets significant improvements in all metrics on both pre-trained and non-pre-trained VLN agents.
- Score: 98.30038910061894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Vision-and-Language Navigation (VLN), an agent needs to navigate through
the environment based on natural language instructions. Due to limited
available data for agent training and finite diversity in navigation
environments, it is challenging for the agent to generalize to new, unseen
environments. To address this problem, we propose EnvEdit, a data augmentation
method that creates new environments by editing existing environments, which
are used to train a more generalizable agent. Our augmented environments can
differ from the seen environments in three diverse aspects: style, object
appearance, and object classes. Training on these edit-augmented environments
prevents the agent from overfitting to existing environments and helps
generalize better to new, unseen environments. Empirically, on both the
Room-to-Room and the multi-lingual Room-Across-Room datasets, we show that our
proposed EnvEdit method gets significant improvements in all metrics on both
pre-trained and non-pre-trained VLN agents, and achieves the new
state-of-the-art on the test leaderboard. We further ensemble the VLN agents
augmented on different edited environments and show that these edit methods are
complementary. Code and data are available at
https://github.com/jialuli-luka/EnvEdit
Related papers
- ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments [13.988804095409133]
We propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks.
Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps.
With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics.
arXiv Detail & Related papers (2024-07-26T07:00:27Z) - PanoGen: Text-Conditioned Panoramic Environment Generation for
Vision-and-Language Navigation [96.8435716885159]
Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments.
One main challenge in VLN is the limited availability of training environments, which makes it hard to generalize to new and unseen environments.
We propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text.
arXiv Detail & Related papers (2023-05-30T16:39:54Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - Spot the Difference: A Novel Task for Embodied Agents in Changing
Environments [43.52107532692226]
Embodied AI aims at creating intelligent agents that can move and operate inside an environment.
We propose Spot the Difference: a novel task for Embodied AI where the agent has access to an outdated map of the environment.
We propose an exploration policy that can take advantage of previous knowledge of the environment and identify changes in the scene faster and more effectively than existing agents.
arXiv Detail & Related papers (2022-04-18T18:30:56Z) - Self-Supervised Policy Adaptation during Deployment [98.25486842109936]
Self-supervision allows the policy to continue training after deployment without using any rewards.
Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom.
Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.
arXiv Detail & Related papers (2020-07-08T17:56:27Z) - Diagnosing the Environment Bias in Vision-and-Language Navigation [102.02103792590076]
Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.
Recent works that study VLN observe a significant performance drop when tested on unseen environments, indicating that the neural agent models are highly biased towards training environments.
In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias.
arXiv Detail & Related papers (2020-05-06T19:24:33Z) - Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks.
Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.