PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
- URL: http://arxiv.org/abs/2505.01881v3
- Date: Fri, 13 Jun 2025 03:36:19 GMT
- Title: PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
- Authors: Trisanth Srinivasan, Santosh Patapati,
- Abstract summary: PhysNav-DG is a novel framework that integrates classical sensor fusion with the semantic power of vision-language models.<n>Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and ablations show that PhysNav-DG improves navigation success rates by over 20% and achieves high efficiency, with explanations that are both highly grounded and clear. This work connects high-level semantic reasoning and geometric planning for safer and more trustworthy autonomous systems.
Related papers
- Integration of a high-fidelity model of quantum sensors with a map-matching filter for quantum-enhanced navigation [0.0]
We report on the realization of a high-fidelity model of an atom-interferometry-based gravity gradiometer.<n>We show that aiding navigation via map matching using quantum gravity gradiometry results in stable trajectories.<n>We derive requirements for mitigating these errors, such as maintaining sensor tilt below 3.3 degrees.
arXiv Detail & Related papers (2025-04-15T12:07:21Z) - Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images [0.9883261192383611]
In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in unstructured environments.<n>We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly.
arXiv Detail & Related papers (2025-03-23T08:25:07Z) - Navigation World Models [68.58459393846461]
We introduce a controllable video generation model that predicts future visual observations based on past observations and navigation actions.<n>In familiar environments, NWM can plan navigation trajectories by simulating them and evaluating whether they achieve the desired goal.<n>Experiments demonstrate its effectiveness in planning trajectories from scratch or by ranking trajectories sampled from an external policy.
arXiv Detail & Related papers (2024-12-04T18:59:45Z) - A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance [59.21686775951903]
Various animals exhibit accurate navigation using environment cues.
Inspired by animal navigation, this work proposes a bionic and data-driven approach for long-distance underwater navigation.
The proposed approach uses measured geomagnetic data for the navigation, and requires no GPS systems or geographical maps.
arXiv Detail & Related papers (2024-02-06T13:20:56Z) - Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial
Mobile Robots Using Double Deep Reinforcement Learning Techniques [1.191504645891765]
We present two distinct approaches aimed at enhancing mapless navigation for a ground-based mobile robot.
The research methodology primarily involves a comparative analysis between a Deep-RL strategy grounded in the foundational Deep Q-Network (DQN) algorithm, and an alternative approach based on the Double Deep Q-Network (DDQN) algorithm.
The proposed methodology is evaluated in three different real environments, revealing that Double Deep structures significantly enhance the navigation capabilities of mobile robots compared to simple Q structures.
arXiv Detail & Related papers (2023-10-20T20:47:07Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Data-Driven Meets Navigation: Concepts, Models, and Experimental
Validation [0.0]
The purpose of navigation is to determine the position, velocity, and orientation of manned and autonomous platforms, humans, and animals.
We review multidisciplinary, data-driven based navigation algorithms developed and experimentally proven at the Autonomous Navigation and Sensor Fusion Lab.
arXiv Detail & Related papers (2022-10-06T14:03:10Z) - Lite-HDSeg: LiDAR Semantic Segmentation Using Lite Harmonic Dense
Convolutions [2.099922236065961]
We present Lite-HDSeg, a novel real-time convolutional neural network for semantic segmentation of full $3$D LiDAR point clouds.
Our experimental results show that the proposed method outperforms state-of-the-art semantic segmentation approaches which can run real-time.
arXiv Detail & Related papers (2021-03-16T04:54:57Z) - A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition.
We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.