Towards Deviation-Robust Agent Navigation via Perturbation-Aware
Contrastive Learning
- URL: http://arxiv.org/abs/2403.05770v1
- Date: Sat, 9 Mar 2024 02:34:13 GMT
- Title: Towards Deviation-Robust Agent Navigation via Perturbation-Aware
Contrastive Learning
- Authors: Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang
Ye, Liang Lin
- Abstract summary: Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment.
We present a model-agnostic training paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER) to enhance the generalization ability of existing VLN agents.
- Score: 125.61772424068903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-and-language navigation (VLN) asks an agent to follow a given language
instruction to navigate through a real 3D environment. Despite significant
advances, conventional VLN agents are trained typically under disturbance-free
environments and may easily fail in real-world scenarios, since they are
unaware of how to deal with various possible disturbances, such as sudden
obstacles or human interruptions, which widely exist and may usually cause an
unexpected route deviation. In this paper, we present a model-agnostic training
paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER)
to enhance the generalization ability of existing VLN agents, by requiring them
to learn towards deviation-robust navigation. Specifically, a simple yet
effective path perturbation scheme is introduced to implement the route
deviation, with which the agent is required to still navigate successfully
following the original instruction. Since directly enforcing the agent to learn
perturbed trajectories may lead to inefficient training, a progressively
perturbed trajectory augmentation strategy is designed, where the agent can
self-adaptively learn to navigate under perturbation with the improvement of
its navigation performance for each specific trajectory. For encouraging the
agent to well capture the difference brought by perturbation, a
perturbation-aware contrastive learning mechanism is further developed by
contrasting perturbation-free trajectory encodings and perturbation-based
counterparts. Extensive experiments on R2R show that PROPER can benefit
multiple VLN baselines in perturbation-free scenarios. We further collect the
perturbed path data to construct an introspection subset based on the R2R,
called Path-Perturbed R2R (PP-R2R). The results on PP-R2R show unsatisfying
robustness of popular VLN agents and the capability of PROPER in improving the
navigation robustness.
Related papers
- Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments [37.20272055902246]
Real-world navigation often involves dealing with unexpected obstructions such as closed doors, moved objects, and unpredictable entities.
This paper introduces an innovative dataset and task, R2R with UNexpected Obstructions (R2R-UNO). R2R-UNO contains various types and numbers of path obstructions to generate instruction-reality mismatches for VLN research.
Experiments on R2R-UNO reveal that state-of-the-art VLN methods inevitably encounter significant challenges when facing such mismatches, indicating that they rigidly follow instructions rather than navigate adaptively.
arXiv Detail & Related papers (2024-07-31T08:55:57Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Bridging the Gap Between Learning in Discrete and Continuous
Environments for Vision-and-Language Navigation [41.334731014665316]
Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments.
We propose a predictor to generate a set of candidate waypoints during navigation.
We show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions.
arXiv Detail & Related papers (2022-03-05T14:56:14Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - Generative Adversarial Imitation Learning for End-to-End Autonomous
Driving on Urban Environments [0.8122270502556374]
Generative Adversarial Imitation Learning (GAIL) can train policies without explicitly requiring to define a reward function.
We show that both of them are capable of imitating the expert trajectory from start to end after training ends.
arXiv Detail & Related papers (2021-10-16T15:04:13Z) - Adversarial Reinforced Instruction Attacker for Robust Vision-Language
Navigation [145.84123197129298]
Language instruction plays an essential role in the natural language grounded navigation tasks.
We exploit to train a more robust navigator which is capable of dynamically extracting crucial factors from the long instruction.
Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target.
arXiv Detail & Related papers (2021-07-23T14:11:31Z) - Language-guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments.
The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments.
We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.