StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
- URL: http://arxiv.org/abs/2509.16415v1
- Date: Fri, 19 Sep 2025 20:57:03 GMT
- Title: StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
- Authors: Zhengri Wu, Yiran Wang, Yu Wen, Zeyu Zhang, Biao Wu, Hao Tang,
- Abstract summary: Underwater stereo depth estimation provides accurate 3D geometry for robotics tasks such as navigation, inspection, and mapping.<n>Existing approaches face two critical challenges: (i) parameter-efficiently adapting large vision foundation encoders to the underwater domain without extensive labeled data, and (ii) tightly fusing globally coherent but scale-ambiguous monocular priors with locally metric yet photometrically fragile stereo correspondences.<n>We propose StereoAdapter, a parameter-efficient self-supervised framework that integrates a LoRA-adapted monocular foundation encoder with a recurrent stereo refinement module.
- Score: 14.61785829674974
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Underwater stereo depth estimation provides accurate 3D geometry for robotics tasks such as navigation, inspection, and mapping, offering metric depth from low-cost passive cameras while avoiding the scale ambiguity of monocular methods. However, existing approaches face two critical challenges: (i) parameter-efficiently adapting large vision foundation encoders to the underwater domain without extensive labeled data, and (ii) tightly fusing globally coherent but scale-ambiguous monocular priors with locally metric yet photometrically fragile stereo correspondences. To address these challenges, we propose StereoAdapter, a parameter-efficient self-supervised framework that integrates a LoRA-adapted monocular foundation encoder with a recurrent stereo refinement module. We further introduce dynamic LoRA adaptation for efficient rank selection and pre-training on the synthetic UW-StereoDepth-40K dataset to enhance robustness under diverse underwater conditions. Comprehensive evaluations on both simulated and real-world benchmarks show improvements of 6.11% on TartanAir and 5.12% on SQUID compared to state-of-the-art methods, while real-world deployment with the BlueROV2 robot further demonstrates the consistent robustness of our approach. Code: https://github.com/AIGeeksGroup/StereoAdapter. Website: https://aigeeksgroup.github.io/StereoAdapter.
Related papers
- StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation [18.410248448681514]
We propose StereoAdapter-2, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator.<n>We construct UW-StereoDepth-80K, a large-scale synthetic underwater stereo dataset.<n>Our framework achieves state-of-the-art zero-shot performance on underwater benchmarks.
arXiv Detail & Related papers (2026-02-18T22:12:08Z) - SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping [6.826863809223021]
Single-modality approaches to 3D reconstruction fail due to poor visibility and geometric constraints.<n>Prior fusion technique relies on flawed geometrics, leading to significant artifacts and an inability to model complex scenes.<n>In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations.
arXiv Detail & Related papers (2025-11-01T04:12:27Z) - Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [62.37493746544967]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps.<n>Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments.<n>We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z) - FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.<n>We first construct a large-scale (1M stereo pairs) synthetic training dataset.<n>We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - DEFOM-Stereo: Depth Foundation Model Based Stereo Matching [12.22373236061929]
DEFOM-Stereo is built to facilitate robust stereo matching with monocular depth cues.<n>It is verified to have much stronger zero-shot generalization compared with SOTA methods.<n>Our model simultaneously outperforms previous models on the individual benchmarks.
arXiv Detail & Related papers (2025-01-16T10:59:29Z) - Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching [7.840781070208874]
We propose leveraging monocular knowledge transfer to enhance stereo matching, namely Mono2Stereo.
We introduce knowledge transfer with a two-stage training process, comprising synthetic data pre-training and real-world data fine-tuning.
Experimental results demonstrate that our pre-trained model exhibits strong zero-shot capabilities.
arXiv Detail & Related papers (2024-11-14T03:01:36Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D.
We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image.
Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.