Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration
- URL: http://arxiv.org/abs/2203.04006v1
- Date: Tue, 8 Mar 2022 11:01:24 GMT
- Title: Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration
- Authors: Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang
- Abstract summary: We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
- Score: 83.96729205383501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language navigation (VLN) is a challenging task due to its large
searching space in the environment. To address this problem, previous works
have proposed some methods of fine-tuning a large model that pretrained on
large-scale datasets. However, the conventional fine-tuning methods require
extra human-labeled navigation data and lack self-exploration capabilities in
environments, which hinders their generalization of unseen scenes. To improve
the ability of fast cross-domain adaptation, we propose Prompt-based
Environmental Self-exploration (ProbES), which can self-explore the
environments by sampling trajectories and automatically generates structured
instructions via a large-scale cross-modal pretrained model (CLIP). Our method
fully utilizes the knowledge learned from CLIP to build an in-domain dataset by
self-exploration without human labeling. Unlike the conventional approach of
fine-tuning, we introduce prompt-based learning to achieve fast adaptation for
language embeddings, which substantially improves the learning efficiency by
leveraging prior knowledge. By automatically synthesizing
trajectory-instruction pairs in any environment without human supervision and
efficient prompt-based learning, our model can adapt to diverse vision-language
navigation tasks, including VLN and REVERIE. Both qualitative and quantitative
results show that our ProbES significantly improves the generalization ability
of the navigation model.
Related papers
- TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation [11.591176410027224]
This paper presents a Vision-Language Navigation (VLN) agent based on Large Language Models (LLMs)
We propose the Thinking, Interacting, and Action framework to compensate for the shortcomings of LLMs in environmental perception.
Our approach also outperformed some supervised learning-based methods, highlighting its efficacy in zero-shot navigation.
arXiv Detail & Related papers (2024-03-13T05:22:39Z) - Masked Path Modeling for Vision-and-Language Navigation [41.7517631477082]
Vision-and-language navigation (VLN) agents are trained to navigate in real-world environments by following natural language instructions.
Previous approaches have attempted to address this issue by introducing additional supervision during training.
We introduce a masked path modeling (MPM) objective, which pretrains an agent using self-collected data for downstream navigation tasks.
arXiv Detail & Related papers (2023-05-23T17:20:20Z) - Curriculum Learning for Vision-and-Language Navigation [16.695511663714214]
Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions.
Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance.
We propose a novel curriculum-based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress.
arXiv Detail & Related papers (2021-11-14T03:02:07Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Pre-trained Word Embeddings for Goal-conditional Transfer Learning in
Reinforcement Learning [0.0]
We show how a pre-trained task-independent language model can make a goal-conditional RL agent more sample efficient.
We do this by facilitating transfer learning between different related tasks.
arXiv Detail & Related papers (2020-07-10T06:42:00Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks.
Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.