ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments
- URL: http://arxiv.org/abs/2304.03047v3
- Date: Mon, 22 Jan 2024 04:57:32 GMT
- Title: ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments
- Authors: Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He,
Liang Wang
- Abstract summary: Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
- Score: 56.194988818341976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language navigation is a task that requires an agent to follow
instructions to navigate in environments. It becomes increasingly crucial in
the field of embodied AI, with potential applications in autonomous navigation,
search and rescue, and human-robot interaction. In this paper, we propose to
address a more practical yet challenging counterpart setting - vision-language
navigation in continuous environments (VLN-CE). To develop a robust VLN-CE
agent, we propose a new navigation framework, ETPNav, which focuses on two
critical skills: 1) the capability to abstract environments and generate
long-range navigation plans, and 2) the ability of obstacle-avoiding control in
continuous environments. ETPNav performs online topological mapping of
environments by self-organizing predicted waypoints along a traversed path,
without prior environmental experience. It privileges the agent to break down
the navigation procedure into high-level planning and low-level control.
Concurrently, ETPNav utilizes a transformer-based cross-modal planner to
generate navigation plans based on topological maps and instructions. The plan
is then performed through an obstacle-avoiding controller that leverages a
trial-and-error heuristic to prevent navigation from getting stuck in
obstacles. Experimental results demonstrate the effectiveness of the proposed
method. ETPNav yields more than 10% and 20% improvements over prior
state-of-the-art on R2R-CE and RxR-CE datasets, respectively. Our code is
available at https://github.com/MarSaKi/ETPNav.
Related papers
- Hierarchical end-to-end autonomous navigation through few-shot waypoint detection [0.0]
Human navigation is facilitated through the association of actions with landmarks.
Current autonomous navigation schemes rely on accurate positioning devices and algorithms as well as extensive streams of sensory data collected from the environment.
We propose a hierarchical end-to-end meta-learning scheme that enables a mobile robot to navigate in a previously unknown environment.
arXiv Detail & Related papers (2024-09-23T00:03:39Z) - IN-Sight: Interactive Navigation through Sight [20.184155117341497]
IN-Sight is a novel approach to self-supervised path planning.
It calculates traversability scores and incorporates them into a semantic map.
To precisely navigate around obstacles, IN-Sight employs a local planner.
arXiv Detail & Related papers (2024-08-01T07:27:54Z) - TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation [5.484041860401147]
TOP-Nav is a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception.
We show that TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge.
arXiv Detail & Related papers (2024-04-23T17:42:45Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation.
It incorporates environmental feedback for refining future plans and adjusting its actions.
It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - Topological Planning with Transformers for Vision-and-Language
Navigation [31.64229792521241]
We propose a modular approach to vision-and-language navigation (VLN) using topological maps.
Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map.
Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
arXiv Detail & Related papers (2020-12-09T20:02:03Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z) - Environment-agnostic Multitask Learning for Natural Language Grounded
Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks.
Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.