Related papers: DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features

DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features

URL: http://arxiv.org/abs/2509.02983v1
Date: Wed, 03 Sep 2025 03:43:12 GMT
Title: DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features
Authors: Jinghe Yang, Minh-Quan Le, Mingming Gong, Ye Pu,
Abstract summary: We propose a Diffusion-based Underwater Visual Navigation policy via knowledge-transferred depth features, named DUViN.<n>DuViN guides the vehicle to avoid obstacles and maintain a safe and perception awareness altitude relative to the terrain without relying on pre-built maps.<n> Experiments in both simulated and real-world underwater environments demonstrate the effectiveness and generalization of our approach.
Score: 47.88998580611257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous underwater navigation remains a challenging problem due to limited sensing capabilities and the difficulty of constructing accurate maps in underwater environments. In this paper, we propose a Diffusion-based Underwater Visual Navigation policy via knowledge-transferred depth features, named DUViN, which enables vision-based end-to-end 4-DoF motion control for underwater vehicles in unknown environments. DUViN guides the vehicle to avoid obstacles and maintain a safe and perception awareness altitude relative to the terrain without relying on pre-built maps. To address the difficulty of collecting large-scale underwater navigation datasets, we propose a method that ensures robust generalization under domain shifts from in-air to underwater environments by leveraging depth features and introducing a novel model transfer strategy. Specifically, our training framework consists of two phases: we first train the diffusion-based visual navigation policy on in-air datasets using a pre-trained depth feature extractor. Secondly, we retrain the extractor on an underwater depth estimation task and integrate the adapted extractor into the trained navigation policy from the first step. Experiments in both simulated and real-world underwater environments demonstrate the effectiveness and generalization of our approach. The experimental videos are available at https://www.youtube.com/playlist?list=PLqt2s-RyCf1gfXJgFzKjmwIqYhrP4I-7Y.

Related papers

Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation [0.0]
This article investigates issues through the case of the BlueROV2, an open platform widely used for scientific experimentation.<n>We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm.<n>Results show that the PPO policy consistently outperforms DWA in highly cluttered environments.
arXiv Detail & Related papers (2025-12-11T18:52:42Z)
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding [60.76337064425815]
We study the underwater scene understanding methods, which aim to achieve automated underwater exploration.<n>NautData is a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks.<n>We propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information.
arXiv Detail & Related papers (2025-10-31T14:00:35Z)
SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments [5.070043385937244]
Enhancing spatial awareness of underwater vehicles is key to reducing piloting risks and enabling greater autonomy.<n>We present SPADE: SParsity Adaptive Depth Estimator, a monocular depth estimation pipeline that combines pre-trained relative depth estimator with sparse depth priors to produce dense, metric scale depth maps.<n>Our approach achieves improved accuracy and generalisation over state-of-the-art baselines and runs efficiently at over 15 FPS on embedded hardware, promising to support practical underwater inspection and intervention.
arXiv Detail & Related papers (2025-10-29T12:37:34Z)
On-board Sonar Data Classification for Path Following in Underwater Vehicles using Fast Interval Type-2 Fuzzy Extreme Learning Machine [0.29767565026354176]
We train a Fuzzy Inference System for on-board sonar data classification using an underwater vehicle called BlueROV2.<n>The proposed approach provides the BlueROV with a more complete sensory picture about its surroundings while real-time navigation planning is performed by the concurrent execution of two or more tasks.
arXiv Detail & Related papers (2025-06-15T08:01:36Z)
Depth-Constrained ASV Navigation with Deep RL and Limited Sensing [45.77464360746532]
We propose a reinforcement learning framework for ASV navigation under depth constraints.<n>To enhance environmental awareness, we integrate GP regression into the RL framework.<n>We demonstrate effective sim-to-real transfer, ensuring that trained policies generalize well to real-world aquatic conditions.
arXiv Detail & Related papers (2025-04-25T10:56:56Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
Augmented reality navigation system for visual prosthesis [67.09251544230744]
We propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning. It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles. Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions.
arXiv Detail & Related papers (2021-09-30T09:41:40Z)
Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment. This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.