Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation
- URL: http://arxiv.org/abs/2003.00443v5
- Date: Tue, 21 Jul 2020 02:54:38 GMT
- Title: Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation
- Authors: Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa
Kozareva, Sujith Ravi
- Abstract summary: We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks.
Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
- Score: 88.69873520186017
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent research efforts enable study for natural language grounded navigation
in photo-realistic environments, e.g., following natural language instructions
or dialog. However, existing methods tend to overfit training data in seen
environments and fail to generalize well in previously unseen environments. To
close the gap between seen and unseen environments, we aim at learning a
generalized navigation model from two novel perspectives: (1) we introduce a
multitask navigation model that can be seamlessly trained on both
Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH)
tasks, which benefits from richer natural language guidance and effectively
transfers knowledge across tasks; (2) we propose to learn environment-agnostic
representations for the navigation policy that are invariant among the
environments seen during training, thus generalizing better on unseen
environments. Extensive experiments show that environment-agnostic multitask
learning significantly reduces the performance gap between seen and unseen
environments, and the navigation agent trained so outperforms baselines on
unseen environments by 16% (relative measure on success rate) on VLN and 120%
(goal progress) on NDH. Our submission to the CVDN leaderboard establishes a
new state-of-the-art for the NDH task on the holdout test set. Code is
available at https://github.com/google-research/valan.
Related papers
- UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation [71.97405667493477]
We introduce a novel, generalizable 3DGS-based pre-training paradigm, called UnitedVLN.
It enables agents to better explore future environments by unitedly rendering high-fidelity 360 visual images and semantic features.
UnitedVLN outperforms state-of-the-art methods on existing VLN-CE benchmarks.
arXiv Detail & Related papers (2024-11-25T02:44:59Z) - Vision-Language Navigation with Continual Learning [10.850410419782424]
Vision-language navigation (VLN) is a critical domain within embedded intelligence.
We propose the Vision-Language Navigation with Continual Learning paradigm to address this challenge.
In this paradigm, agents incrementally learn new environments while retaining previously acquired knowledge.
arXiv Detail & Related papers (2024-09-04T09:28:48Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z) - SASRA: Semantically-aware Spatio-temporal Reasoning Agent for
Vision-and-Language Navigation in Continuous Environments [7.5606260987453116]
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.
Existing end-to-end learning-based methods struggle at this task as they focus mostly on raw visual observations.
We present a hybrid transformer-recurrence model which focuses on combining classical semantic mapping techniques with a learning-based method.
arXiv Detail & Related papers (2021-08-26T17:57:02Z) - Diagnosing the Environment Bias in Vision-and-Language Navigation [102.02103792590076]
Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.
Recent works that study VLN observe a significant performance drop when tested on unseen environments, indicating that the neural agent models are highly biased towards training environments.
In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias.
arXiv Detail & Related papers (2020-05-06T19:24:33Z) - Towards Learning a Generic Agent for Vision-and-Language Navigation via
Pre-training [150.35927365127176]
We present the first pre-training and fine-tuning paradigm for vision-and-language navigation (VLN) tasks.
By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.
It learns more effectively in new tasks and generalizes better in a previously unseen environment.
arXiv Detail & Related papers (2020-02-25T03:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.