MarineFormer: A Spatio-Temporal Attention Model for USV Navigation in Dynamic Marine Environments
- URL: http://arxiv.org/abs/2410.13973v4
- Date: Wed, 09 Jul 2025 23:40:31 GMT
- Title: MarineFormer: A Spatio-Temporal Attention Model for USV Navigation in Dynamic Marine Environments
- Authors: Ehsan Kazemi, Dechen Gao, Iman Soltani,
- Abstract summary: We show that incorporating local flow field measurements fundamentally alters the nature of the problem, transforming otherwise unsolvable navigation scenarios into tractable ones.<n>We propose textbfMarineFormer, a Transformer-based policy architecture that integrates spatial attention for sensor fusion, and temporal attention for capturing environmental dynamics.
- Score: 3.669739954510407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous navigation in marine environments can be extremely challenging, especially in the presence of spatially varying flow disturbances and dynamic and static obstacles. In this work, we demonstrate that incorporating local flow field measurements fundamentally alters the nature of the problem, transforming otherwise unsolvable navigation scenarios into tractable ones. However, the mere availability of flow data is not sufficient; it must be effectively fused with conventional sensory inputs such as ego-state and obstacle states. To this end, we propose \textbf{MarineFormer}, a Transformer-based policy architecture that integrates two complementary attention mechanisms: spatial attention for sensor fusion, and temporal attention for capturing environmental dynamics. MarineFormer is trained end-to-end via reinforcement learning in a 2D simulated environment with realistic flow features and obstacles. Extensive evaluations against classical and state-of-the-art baselines show that our approach improves episode completion success rate by nearly 23\% while reducing path length. Ablation studies further highlight the critical role of flow measurements and the effectiveness of our proposed architecture in leveraging them.
Related papers
- Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation [0.0]
This article investigates issues through the case of the BlueROV2, an open platform widely used for scientific experimentation.<n>We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm.<n>Results show that the PPO policy consistently outperforms DWA in highly cluttered environments.
arXiv Detail & Related papers (2025-12-11T18:52:42Z) - IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z) - Unified Multimodal Vessel Trajectory Prediction with Explainable Navigation Intention [18.699213433572996]
Vessel trajectory prediction is fundamental to intelligent maritime systems.<n>Existing vessel trajectory prediction methods suffer from limited scenario applicability and insufficient explainability.<n>We propose a unified vessel trajectory prediction framework incorporating explainable navigation intentions.
arXiv Detail & Related papers (2025-11-18T08:56:30Z) - Secure Low-altitude Maritime Communications via Intelligent Jamming [53.42658269206017]
Low-altitude wireless networks (LAWNs) have emerged as a viable solution for maritime communications.<n>The open and clear UAV communication channels make maritime LAWNs vulnerable to eavesdropping attacks.<n>We propose a low-altitude maritime communication system that employs intelligent jamming to counter dynamic eavesdroppers.
arXiv Detail & Related papers (2025-11-10T03:16:19Z) - E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization [38.46024197872764]
estimation of optical flow and 6-DoF ego-motion has typically been addressed independently.<n>For neuromorphic vision, the lack of robust data association makes solving the two problems separately an ill-posed challenge.<n>We propose an unsupervised framework that jointly optimize egomotion and optical flow via implicit spatial-temporal and geometric regularization.
arXiv Detail & Related papers (2025-10-14T17:33:44Z) - DUViN: Diffusion-Based Underwater Visual Navigation via Knowledge-Transferred Depth Features [47.88998580611257]
We propose a Diffusion-based Underwater Visual Navigation policy via knowledge-transferred depth features, named DUViN.<n>DuViN guides the vehicle to avoid obstacles and maintain a safe and perception awareness altitude relative to the terrain without relying on pre-built maps.<n> Experiments in both simulated and real-world underwater environments demonstrate the effectiveness and generalization of our approach.
arXiv Detail & Related papers (2025-09-03T03:43:12Z) - Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) [51.510040541600176]
We propose a novel approach to modeling the rotation of moving objects in computer vision.<n>Our approach is agnostic to energy and momentum conservation while being robust to input noise.<n>By learning to approximate object dynamics from noisy states during training, our model attains robust extrapolation capabilities in simulation and various real-world settings.
arXiv Detail & Related papers (2025-08-11T09:03:10Z) - From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - Situationally-Aware Dynamics Learning [57.698553219660376]
We propose a novel framework for online learning of hidden state representations.<n>Our approach explicitly models the influence of unobserved parameters on both transition dynamics and reward structures.<n>Experiments in both simulation and real world reveal significant improvements in data efficiency, policy performance, and the emergence of safer, adaptive navigation strategies.
arXiv Detail & Related papers (2025-05-26T06:40:11Z) - Depth-Constrained ASV Navigation with Deep RL and Limited Sensing [45.77464360746532]
We propose a reinforcement learning framework for ASV navigation under depth constraints.
To enhance environmental awareness, we integrate GP regression into the RL framework.
We demonstrate effective sim-to-real transfer, ensuring that trained policies generalize well to real-world aquatic conditions.
arXiv Detail & Related papers (2025-04-25T10:56:56Z) - Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.<n>Previous works have introduced methods to adapt to turbidity and backscattering.<n>We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.
This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - Monte Carlo Tree Search with Velocity Obstacles for safe and efficient motion planning in dynamic environments [49.30744329170107]
We propose a novel approach for optimal online motion planning with minimal information about dynamic obstacles.
The proposed methodology combines Monte Carlo Tree Search (MCTS), for online optimal planning via model simulations, with Velocity Obstacles (VO), for obstacle avoidance.
We show the superiority of our methodology with respect to state-of-the-art planners, including Non-linear Model Predictive Control (NMPC), in terms of improved collision rate, computational and task performance.
arXiv Detail & Related papers (2025-01-16T16:45:08Z) - Navigation World Models [68.58459393846461]
We introduce a controllable video generation model that predicts future visual observations based on past observations and navigation actions.
In familiar environments, NWM can plan navigation trajectories by simulating them and evaluating whether they achieve the desired goal.
Experiments demonstrate its effectiveness in planning trajectories from scratch or by ranking trajectories sampled from an external policy.
arXiv Detail & Related papers (2024-12-04T18:59:45Z) - Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping [2.9109581496560044]
This paper examines the robustness of benchmark deep reinforcement learning (RL) algorithms, implemented for inland waterway transport (IWT) within an autonomous shipping simulator.
We show that a model-free approach can achieve an adequate policy in the simulator, successfully navigating port environments never encountered during training.
arXiv Detail & Related papers (2024-11-07T17:55:07Z) - Deep-Sea A*+: An Advanced Path Planning Method Integrating Enhanced A* and Dynamic Window Approach for Autonomous Underwater Vehicles [1.3807821497779342]
Extreme conditions in the deep-sea environment pose significant challenges for underwater operations.
We propose an advanced path planning methodology that integrates an improved A* algorithm with the Dynamic Window Approach (DWA)
Our proposed method surpasses the traditional A* algorithm in terms of path smoothness, obstacle avoidance, and real-time performance.
arXiv Detail & Related papers (2024-10-22T07:29:05Z) - DiffuTraj: A Stochastic Vessel Trajectory Prediction Approach via Guided Diffusion Process [23.42712306116432]
Vessel maneuvers, characterized by their inherent complexity and indeterminacy, require vessel trajectory prediction system.
Conventional trajectory prediction methods utilize latent variables to represent the multi-modality of vessel motion.
We explicitly simulate the transition of vessel motion from uncertainty towards a state of certainty.
arXiv Detail & Related papers (2024-10-12T14:50:18Z) - FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs)
Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths.
We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z) - Model-Based Reinforcement Learning for Control of Strongly-Disturbed Unsteady Aerodynamic Flows [0.0]
We propose a model-based reinforcement learning (MBRL) approach by incorporating a novel reduced-order model as a surrogate for the full environment.
The robustness and generalizability of the model is demonstrated in two distinct flow environments.
We demonstrate that the policy learned in the reduced-order environment translates to an effective control strategy in the full CFD environment.
arXiv Detail & Related papers (2024-08-26T23:21:44Z) - TransFlower: An Explainable Transformer-Based Model with Flow-to-Flow
Attention for Commuting Flow Prediction [18.232085070775835]
We introduce TransFlower, an explainable, transformer-based model employing flow-to-flow attention to predict commuting patterns.
Our model outperforms existing methods by up to 30.8% Common Part of Commuters.
arXiv Detail & Related papers (2024-02-23T16:00:04Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Two-step dynamic obstacle avoidance [0.0]
This paper proposes a two-step architecture for handling dynamic obstacle avoidance (DOA) tasks by combining supervised and reinforcement learning (RL)
In the first step, we introduce a data-driven approach to estimate the collision risk (CR) of an obstacle using a recurrent neural network.
In the second step, we include these CR estimates into the observation space of an RL agent to increase its situational awareness.
arXiv Detail & Related papers (2023-11-28T14:55:50Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Learned Risk Metric Maps for Kinodynamic Systems [54.49871675894546]
We present Learned Risk Metric Maps for real-time estimation of coherent risk metrics of high dimensional dynamical systems.
LRMM models are simple to design and train, requiring only procedural generation of obstacle sets, state and control sampling, and supervised training of a function approximator.
arXiv Detail & Related papers (2023-02-28T17:51:43Z) - PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for
Traffic Flow Prediction [78.05103666987655]
spatial-temporal Graph Neural Network (GNN) models have emerged as one of the most promising methods to solve this problem.
We propose a novel propagation delay-aware dynamic long-range transFormer, namely PDFormer, for accurate traffic flow prediction.
Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2023-01-19T08:42:40Z) - STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic
Cross-Modal Understanding [68.96574451918458]
We propose a framework named STVG, which models visual-linguistic dependencies with a static branch and a dynamic branch.
Both the static and dynamic branches are designed as cross-modal transformers.
Our proposed method achieved 39.6% vIoU and won the first place in the HC-STVG of the Person in Context Challenge.
arXiv Detail & Related papers (2022-07-06T15:48:58Z) - Manifold Interpolating Optimal-Transport Flows for Trajectory Inference [64.94020639760026]
We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow)
MIOFlow learns, continuous population dynamics from static snapshot samples taken at sporadic timepoints.
We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.
arXiv Detail & Related papers (2022-06-29T22:19:03Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - DPMPC-Planner: A real-time UAV trajectory planning framework for complex
static environments with dynamic obstacles [0.9462808515258462]
Safe UAV navigation is challenging due to the complex environment structures, dynamic obstacles, and uncertainties from measurement noises and unpredictable moving obstacle behaviors.
This paper proposes a trajectory planning framework to achieve safe navigation considering complex static environments with dynamic obstacles.
arXiv Detail & Related papers (2021-09-14T23:51:02Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling [65.99956848461915]
Vision-and-Language Navigation (VLN) is a task where agents must decide how to move through a 3D environment to reach a goal.
One of the problems of the VLN task is data scarcity since it is difficult to collect enough navigation paths with human-annotated instructions for interactive environments.
We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
arXiv Detail & Related papers (2019-11-17T18:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.