DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation
- URL: http://arxiv.org/abs/2509.21930v1
- Date: Fri, 26 Sep 2025 06:15:31 GMT
- Title: DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation
- Authors: Jiahui Wang, Changhao Chen,
- Abstract summary: DynaNav is a Dynamic Visual Navigation framework that adapts feature and layer selection based on scene complexity.<n>It employs a trainable hard feature selector for sparse operations, enhancing efficiency and interpretability.<n>Compared to ViNT, DynaNav achieves a 2.26x reduction in FLOPs, 42.3% lower inference time, and 32.8% lower memory usage.
- Score: 19.501191923456584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual navigation is essential for robotics and embodied AI. However, existing foundation models, particularly those with transformer decoders, suffer from high computational overhead and lack interpretability, limiting their deployment in resource-tight scenarios. To address this, we propose DynaNav, a Dynamic Visual Navigation framework that adapts feature and layer selection based on scene complexity. It employs a trainable hard feature selector for sparse operations, enhancing efficiency and interpretability. Additionally, we integrate feature selection into an early-exit mechanism, with Bayesian Optimization determining optimal exit thresholds to reduce computational cost. Extensive experiments in real-world-based datasets and simulated environments demonstrate the effectiveness of DynaNav. Compared to ViNT, DynaNav achieves a 2.26x reduction in FLOPs, 42.3% lower inference time, and 32.8% lower memory usage, while improving navigation performance across four public datasets.
Related papers
- A Reliable Indoor Navigation System for Humans Using AR-based Technique [0.0]
An AR-based technique has been applied to campus and small-site navigation, where Vuforia Area Target is used for environment modeling.<n>Compared to Dijkstra's algorithm, it can reach a solution about two to three times faster for smaller search spaces.<n>Results show that AR technology integrated with existing pathfinding algorithms is feasible and scalable.
arXiv Detail & Related papers (2026-02-27T06:18:49Z) - History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation [64.51891404034164]
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments.<n>Existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects.<n>This work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline.
arXiv Detail & Related papers (2025-12-16T09:16:07Z) - FOM-Nav: Frontier-Object Maps for Object Goal Navigation [65.76906445210112]
FOM-Nav is a framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models.<n>To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments.<n> FOM-Nav achieves state-of-the-art performance on the MP3D and HM3D benchmarks, particularly in navigation efficiency metric SPL.
arXiv Detail & Related papers (2025-11-30T18:16:09Z) - Harnessing Input-Adaptive Inference for Efficient VLN [13.847596428283861]
An emerging paradigm in vision-and-language navigation (VLN) is the use of history-aware multi-modal transformer models.<n>We propose a novel input-adaptive navigation method to enhance VLN model efficiency.
arXiv Detail & Related papers (2025-08-12T18:05:33Z) - PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models [16.820485795257195]
PIG-Nav (Pretrained Image-Goal Navigation) is a new approach that further investigates pretraining strategies for vision-based navigation models.<n>We identify two critical design choices that consistently improve the performance of pretrained navigation models.<n>Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models.
arXiv Detail & Related papers (2025-07-23T05:34:20Z) - SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios [10.303368447554591]
The proposed SSF-PAN can achieve the functionalities of LiDAR point cloud based object detection/localization and SLAM.<n>It is validated using the SUScape-CARLA and the KITTI datasets, as well as on the CARLA simulator.<n> Experimental results demonstrate that the proposed approach outperforms traditional methods in terms of scene flow accuracy, moving object detection accuracy, computational efficiency, and autonomous navigation effectiveness.
arXiv Detail & Related papers (2025-01-28T07:15:39Z) - PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy.
PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects.
We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - PlaceNav: Topological Navigation through Place Recognition [1.9382079036818822]
We present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components.
We utilize visual place recognition for the subgoal selection of the topological navigation pipeline.
Our experimental results verify the design and the new method obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency.
arXiv Detail & Related papers (2023-09-29T14:12:54Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation [72.24964965882783]
Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error.<n>Real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies.<n>We introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.