SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
- URL: http://arxiv.org/abs/2511.00392v1
- Date: Sat, 01 Nov 2025 04:12:27 GMT
- Title: SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
- Authors: Lingpeng Chen, Jiakun Tang, Apple Pui-Yi Chui, Ziyang Hong, Junfeng Wu,
- Abstract summary: Single-modality approaches to 3D reconstruction fail due to poor visibility and geometric constraints.<n>Prior fusion technique relies on flawed geometrics, leading to significant artifacts and an inability to model complex scenes.<n>In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations.
- Score: 6.826863809223021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant artifacts and an inability to model complex scenes. In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations by adapting the principled plane sweep algorithm for cross-modal fusion between sonar and visual data. Extensive experiments in both high-fidelity simulation and real-world environments demonstrate that SonarSweep consistently generates dense and accurate depth maps, significantly outperforming state-of-the-art methods across challenging conditions, particularly in high turbidity. To foster further research, we will publicly release our code and a novel dataset featuring synchronized stereo-camera and sonar data, the first of its kind.
Related papers
- Robust Mesh Saliency GT Acquisition in VR via View Cone Sampling and Geometric Smoothing [59.12032628787018]
3D mesh saliency ground truth is essential for human-centric visual modeling in virtual reality (VR)<n>Current VR eye-tracking pipelines rely on single ray sampling and Euclidean smoothing, triggering texture attention and signal leakage across gaps.<n>This paper proposes a robust framework to address these limitations.
arXiv Detail & Related papers (2026-01-06T05:20:12Z) - Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views [40.70901994944635]
We introduce CoherentGS, a framework for high-fidelity 3D reconstruction from sparse and blurry images.<n>Our key insight is to address these compound degradations using a dual-prior strategy.<n>CoherentGS significantly outperforms existing methods, setting a new state-of-the-art for this challenging task.
arXiv Detail & Related papers (2025-12-11T07:36:35Z) - OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z) - JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting [10.690965024885358]
Reconstructing 3D scenes from sparse viewpoints is a long-standing challenge with wide applications.<n>Recent advances in feed-forward 3D Gaussian sparse-view reconstruction methods provide an efficient solution for real-time novel view synthesis.<n>We propose JointSplat, a unified framework that leverages the complementarity between optical flow and depth.
arXiv Detail & Related papers (2025-06-04T12:04:40Z) - Plenodium: UnderWater 3D Scene Reconstruction with Plenoptic Medium Representation [31.47797579690604]
We present Plenodium, a 3D representation framework capable of jointly modeling both objects and participating media.<n>In contrast to existing medium representations that rely solely on view-dependent modeling, our novel plenoptic medium representation incorporates both directional and positional information.<n>Experiments on real-world underwater datasets demonstrate that our method achieves significant improvements in 3D reconstruction.
arXiv Detail & Related papers (2025-05-27T14:37:58Z) - 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling [30.985414238960466]
3D Gaussian Splatting (3DGS) offers real-time rendering capabilities, but struggles with underwater inhomogeneous environments.<n>We propose a physics-based framework that disentangles object appearance from water medium effects.<n>Our approach achieves both high-quality novel view synthesis and physically accurate scene restoration.
arXiv Detail & Related papers (2025-05-27T14:19:30Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.