MANTA: Physics-Informed Generalized Underwater Object Tracking
- URL: http://arxiv.org/abs/2511.23405v1
- Date: Fri, 28 Nov 2025 17:59:06 GMT
- Title: MANTA: Physics-Informed Generalized Underwater Object Tracking
- Authors: Suhas Srinath, Hemang Jamadagni, Aditya Chadrasekar, Prathosh AP,
- Abstract summary: We present MANTA, a physics-informed framework integrating representation learning with tracking design for underwater scenarios.<n>MANTA achieves state-of-the-art performance, improving Success AUC by up to 6 percent, while ensuring stable long-term generalized underwater tracking and efficient runtime.
- Score: 7.246898300861601
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water conditions. Existing trackers trained on terrestrial data fail to generalize to these physics-driven degradations. We present MANTA, a physics-informed framework integrating representation learning with tracking design for underwater scenarios. We propose a dual-positive contrastive learning strategy coupling temporal consistency with Beer-Lambert augmentations to yield features robust to both temporal and underwater distortions. We further introduce a multi-stage pipeline augmenting motion-based tracking with a physics-informed secondary association algorithm that integrates geometric consistency and appearance similarity for re-identification under occlusion and drift. To complement standard IoU metrics, we propose Center-Scale Consistency (CSC) and Geometric Alignment Score (GAS) to assess geometric fidelity. Experiments on four underwater benchmarks (WebUOT-1M, UOT32, UTB180, UWCOT220) show that MANTA achieves state-of-the-art performance, improving Success AUC by up to 6 percent, while ensuring stable long-term generalized underwater tracking and efficient runtime.
Related papers
- Stereo-Inertial Poser: Towards Metric-Accurate Shape-Aware Motion Capture Using Sparse IMUs and a Single Stereo Camera [54.967647497048205]
We present Stereo-Inertial Poser, a real-time motion capture system that estimates metric-accurate and shape-aware 3D human motion.<n>We replace the monocular RGB with stereo vision, enabling direct 3D keypoint extraction and body shape parameter estimation.<n>Our method produces drift-free global translation under a long recording time and reduces foot-skating effects.
arXiv Detail & Related papers (2026-03-02T17:46:38Z) - High-Resolution Underwater Camouflaged Object Detection: GBU-UCOD Dataset and Topology-Aware and Frequency-Decoupled Networks [32.76569239634241]
We propose a novel framework that integrates topology-aware modeling with frequency-decoupled perception.<n>DeepTopo-Net achieves state-of-the-art performance, particularly in preserving morphological integrity of complex underwater patterns.
arXiv Detail & Related papers (2026-02-03T14:41:27Z) - From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction [22.291273919939957]
We develop a scalable synthetic data pipeline that generates human frames and motion-aligned sequences with pixel-accurate depth, normals, and masks.<n>We train a unified ViT-based dense predictor that injects an explicit geometric human prior via CSE embeddings.<n>Our two-stage training strategy, combining static pretraining with dynamic sequence supervision, enables the model first to acquire robust spatial representations and then refine temporal consistency across motion-aligned sequences.
arXiv Detail & Related papers (2026-02-02T05:28:58Z) - MARVO: Marine-Adaptive Radiance-aware Visual Odometry [0.5336398444466023]
We introduce MARVO, a physics-aware, learning-integratedometry framework that fuses underwater image formation modeling, differentiable matching, and reinforcement learning.<n>A Reinforcement-based PoseGraph refines global trajectories beyond local minima of classical least-squares by learning optimal retraction actions on SE(2).
arXiv Detail & Related papers (2025-11-28T03:31:40Z) - HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking [80.07224739976911]
Event cameras offer exceptional temporal resolution and a range (modal)<n> RGB cameras excel at capturing rich texture with high resolution, whereas event cameras offer exceptional temporal resolution and a range (modal)
arXiv Detail & Related papers (2025-10-22T13:15:13Z) - Underwater Monocular Metric Depth Estimation: Real-World Benchmarks and Synthetic Fine-Tuning with Vision Foundation Models [0.0]
We present a benchmark of zero-shot and fine-tuned monocular metric depth estimation models on real-world underwater datasets.<n>Our results show that large-scale models trained on terrestrial data (real or synthetic) are effective in in-air settings, but perform poorly underwater.<n>This study presents a detailed evaluation and visualization of monocular metric depth estimation in underwater scenes.
arXiv Detail & Related papers (2025-07-02T21:06:39Z) - Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement [8.16306466526838]
We present a novel dual-stream architecture that achieves state-of-the-art underwater image enhancement.<n>Our method simultaneously estimates transmission maps and spatially-varying background light through a dedicated physics estimator.<n>Our approach also features a novel optimization objective ensuring both physical adherence and perceptual quality across multiple spatial frequencies.
arXiv Detail & Related papers (2025-06-05T08:39:17Z) - Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.<n>Previous works have introduced methods to adapt to turbidity and backscattering.<n>We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - An Efficient Detection and Control System for Underwater Docking using
Machine Learning and Realistic Simulation: A Comprehensive Approach [5.039813366558306]
This work compares different deep-learning architectures to perform underwater docking detection and classification.
A Generative Adversarial Network (GAN) is used to do image-to-image translation, converting the Gazebo simulation image into an underwater-looking image.
Results show an improvement of 20% in the high turbidity scenarios regardless of the underwater currents.
arXiv Detail & Related papers (2023-11-02T18:10:20Z) - SM/VIO: Robust Underwater State Estimation Switching Between Model-based
and Visual Inertial Odometry [1.9785872350085876]
This paper addresses the robustness problem of visual-inertial state estimation for underwater operations.
The proposed approach utilizes a model of the robot's kinematics together with proprioceptive sensors to maintain the pose estimate.
Health-monitoring tracks the VIO process ensuring timely switches between the two estimators.
arXiv Detail & Related papers (2023-04-04T17:46:20Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.