PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
- URL: http://arxiv.org/abs/2407.11487v1
- Date: Tue, 16 Jul 2024 08:22:18 GMT
- Title: PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
- Authors: Renjie Lu, Jingke Meng, Wei-Shi Zheng,
- Abstract summary: Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction.
Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning.
We propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories.
- Score: 30.710806048991923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they suffer from high computational cost when attempting to support such high-level predictions with GCN-like models. In this work, we propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories, which refers to a path from the initial node to the candidate locations on a directed graph without detours. This planning strategy leads to an efficient model while achieving strong performance. Specifically, we introduce a directed graph to illustrate the explored area of the environment, emphasizing directionality. Then, we firstly define the trajectory representation as a sequence of directed edge features, which are extracted from the panorama based on the corresponding orientation. Ultimately, we assess and compare the alignment between instruction and different trajectories during navigation to determine the next navigation target. Our method outperforms previous SOTA method BEVBert on RxR dataset and is comparable on R2R dataset while largely reducing the computational cost. Code is available: https://github.com/iSEE-Laboratory/VLN-PRET.
Related papers
- NavTopo: Leveraging Topological Maps For Autonomous Navigation Of a Mobile Robot [1.0550841723235613]
We propose a full navigation pipeline based on topological map and two-level path planning.
The pipeline localizes in the graph by matching neural network descriptors and 2D projections of the input point clouds.
We test our approach in a large indoor photo-relaistic simulated environment and compare it to a metric map-based approach based on popular metric mapping method RTAB-MAP.
arXiv Detail & Related papers (2024-10-15T10:54:49Z) - Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task.
Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making.
Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z) - Learning to Predict Navigational Patterns from Partial Observations [63.04492958425066]
This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only.
We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field.
Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset.
arXiv Detail & Related papers (2023-04-26T02:08:46Z) - ReVoLT: Relational Reasoning and Voronoi Local Graph Planning for
Target-driven Navigation [1.0896567381206714]
Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world.
Recent works focus on exploiting layout relationships by graph neural networks (GNNs)
We decouple this task and propose ReVoLT, a hierarchical framework.
arXiv Detail & Related papers (2023-01-06T05:19:56Z) - Find a Way Forward: a Language-Guided Semantic Map Navigator [53.69229615952205]
This paper attacks the problem of language-guided navigation in a new perspective.
We use novel semantic navigation maps, which enables robots to carry out natural language instructions and move to a target position based on the map observations.
The proposed approach has noticeable performance gains, especially in long-distance navigation cases.
arXiv Detail & Related papers (2022-03-07T07:40:33Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time.
We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods.
Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z) - Waypoint Models for Instruction-guided Navigation in Continuous
Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Topological Planning with Transformers for Vision-and-Language
Navigation [31.64229792521241]
We propose a modular approach to vision-and-language navigation (VLN) using topological maps.
Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map.
Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
arXiv Detail & Related papers (2020-12-09T20:02:03Z) - High-Level Plan for Behavioral Robot Navigation with Natural Language
Directions and R-NET [6.47137925955334]
We develop an understanding of the behavioral navigational graph to enable the pointer network to produce a sequence of behaviors representing the path.
Tests on the navigation graph dataset show that our model outperforms the state-of-the-art approach for both known and unknown environments.
arXiv Detail & Related papers (2020-01-08T01:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.