Related papers: Efficient Strategy Learning by Decoupling Searching and Pathfinding for Object Navigation

Efficient Strategy Learning by Decoupling Searching and Pathfinding for Object Navigation

URL: http://arxiv.org/abs/2406.14103v2
Date: Tue, 22 Jul 2025 02:17:30 GMT
Title: Efficient Strategy Learning by Decoupling Searching and Pathfinding for Object Navigation
Authors: Yanwei Zheng, Shaopu Feng, Bowen Huang, Chuanlin Lan, Xiao Zhang, Dongxiao Yu,
Abstract summary: Two-Stage Reward Mechanism (TSRM) for object navigation decouples the searching and pathfinding behaviours in an episode.<n>Also, we propose a pretraining method Depth Enhanced Masked Autoencoders (DE-MAE) that enables agent to determine explored and unexplored areas.<n>In addition, we propose a new metric of Searching Success weighted by Searching Path Length (SSSPL) that assesses agent's searching ability and exploring efficiency.
Score: 11.816377162334401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inspired by human-like behaviors for navigation: first searching to explore unknown areas before discovering the target, and then the pathfinding of moving towards the discovered target, recent studies design parallel submodules to achieve different functions in the searching and pathfinding stages, while ignoring the differences in reward signals between the two stages. As a result, these models often cannot be fully trained or are overfitting on training scenes. Another bottleneck that restricts agents from learning two-stage strategies is spatial perception ability, since the studies used generic visual encoders without considering the depth information of navigation scenes. To release the potential of the model on strategy learning, we propose the Two-Stage Reward Mechanism (TSRM) for object navigation that decouples the searching and pathfinding behaviours in an episode, enabling the agent to explore larger area in searching stage and seek the optimal path in pathfinding stage. Also, we propose a pretraining method Depth Enhanced Masked Autoencoders (DE-MAE) that enables agent to determine explored and unexplored areas during the searching stage, locate target object and plan paths during the pathfinding stage more accurately. In addition, we propose a new metric of Searching Success weighted by Searching Path Length (SSSPL) that assesses agent's searching ability and exploring efficiency. Finally, we evaluated our method on AI2-Thor and RoboTHOR extensively and demonstrated it can outperform the state-of-the-art (SOTA) methods in both the success rate and the navigation efficiency.

Related papers

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance [13.922655150502365]
CREStE learns representations and rewards for addressing the full mapless navigation problem. We evaluate CREStE in kilometer-scale navigation tasks across six distinct urban environments.
arXiv Detail & Related papers (2025-03-05T21:42:46Z)
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation [11.591176410027224]
This paper presents a Vision-Language Navigation (VLN) agent based on Large Language Models (LLMs) We propose the Thinking, Interacting, and Action framework to compensate for the shortcomings of LLMs in environmental perception. Our approach also outperformed some supervised learning-based methods, highlighting its efficacy in zero-shot navigation.
arXiv Detail & Related papers (2024-03-13T05:22:39Z)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z)
Implicit Obstacle Map-driven Indoor Navigation Model for Robust Obstacle Avoidance [16.57243997206754]
We propose a novel implicit obstacle map-driven indoor navigation framework for robust obstacle avoidance. A non-local target memory aggregation module is designed to leverage a non-local network to model the intrinsic relationship between the target semantic and the target orientation clues.
arXiv Detail & Related papers (2023-08-24T15:10:28Z)
How To Not Train Your Dragon: Training-free Embodied Object Goal Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI. Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework. Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z)
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation [61.08389704326803]
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. We propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability.
arXiv Detail & Related papers (2023-03-28T08:00:46Z)
Holistic Deep-Reinforcement-Learning-based Training of Autonomous Navigation Systems [4.409836695738518]
Deep Reinforcement Learning emerged as a promising approach for autonomous navigation of ground vehicles. In this paper, we propose a holistic Deep Reinforcement Learning training approach involving all entities of the navigation stack.
arXiv Detail & Related papers (2023-02-06T16:52:15Z)
Learning to Explore by Reinforcement over High-Level Options [0.0]
We propose a new method which grants an agent two intertwined options of behaviors: "look-around" and "frontier navigation" In each timestep, an agent produces an option and a corresponding action according to the policy. We demonstrate the effectiveness of the proposed method on two publicly available 3D environment datasets.
arXiv Detail & Related papers (2021-11-02T04:21:34Z)
Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation [117.26891277593205]
We focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense. Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling framework. We show strong experimental results of PEMR on the EmbodiedQA navigation task.
arXiv Detail & Related papers (2021-10-16T13:30:55Z)
Augmented reality navigation system for visual prosthesis [67.09251544230744]
We propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning. It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles. Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions.
arXiv Detail & Related papers (2021-09-30T09:41:40Z)
Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z)
Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs. The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps. The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z)
MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents. We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation. We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z)
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments. We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z)
Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment. This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes. Our proposed method combines visual features and 3D spatial representations to learn navigation policy. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.