Related papers: S2ML: Spatio-Spectral Mutual Learning for Depth Completion

S2ML: Spatio-Spectral Mutual Learning for Depth Completion

URL: http://arxiv.org/abs/2511.06033v1
Date: Sat, 08 Nov 2025 15:01:55 GMT
Title: S2ML: Spatio-Spectral Mutual Learning for Depth Completion
Authors: Zihui Zhao, Yifei Zhang, Zheng Wang, Yang Li, Kui Jiang, Zihan Geng, Chia-Wen Lin,
Abstract summary: Raw depth images captured by RGB-D cameras often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifacts.<n>Existing methods address this problem through depth completion in the image domain, but they overlook the physical characteristics of raw depth images.<n>We propose a Spatio-Spectral Mutual Learning framework (S2ML) to harmonize the advantages of both spatial and frequency domains for depth completion.
Score: 56.26679539288063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The raw depth images captured by RGB-D cameras using Time-of-Flight (TOF) or structured light often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifacts, which limit their applications in downstream vision tasks. Existing methods address this problem through depth completion in the image domain, but they overlook the physical characteristics of raw depth images. It has been observed that the presence of invalid depth areas alters the frequency distribution pattern. In this work, we propose a Spatio-Spectral Mutual Learning framework (S2ML) to harmonize the advantages of both spatial and frequency domains for depth completion. Specifically, we consider the distinct properties of amplitude and phase spectra and devise a dedicated spectral fusion module. Meanwhile, the local and global correlations between spatial-domain and frequency-domain features are calculated in a unified embedding space. The gradual mutual representation and refinement encourage the network to fully explore complementary physical characteristics and priors for more accurate depth completion. Extensive experiments demonstrate the effectiveness of our proposed S2ML method, outperforming the state-of-the-art method CFormer by 0.828 dB and 0.834 dB on the NYU-Depth V2 and SUN RGB-D datasets, respectively.

Related papers

DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization [43.974708665104565]
We introduce DepthMatch, a semi-supervised learning framework that is specifically designed for RGB-D scene parsing.<n>We propose complementary patch mix-up augmentation to explore the latent relationships between texture and spatial features in RGB-D image pairs.<n>We also design a lightweight spatial prior injector to replace traditional complex fusion modules, improving the efficiency of heterogeneous feature fusion.
arXiv Detail & Related papers (2025-05-26T14:26:31Z)
FUSION: Frequency-guided Underwater Spatial Image recOnstructioN [0.0]
Underwater images suffer from severe degradations, including color distortions, reduced visibility, and loss of structural details due to wavelength-dependent attenuation and scattering.<n>Existing enhancement methods primarily focus on spatial-domain processing, neglecting the frequency domain's potential to capture global color distributions and long-range dependencies.<n>We propose fusion, a dual-domain deep learning framework that jointly leverages spatial and frequency domain information.
arXiv Detail & Related papers (2025-04-01T23:16:19Z)
Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions [58.88917836512819]
We propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints. To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking. Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset.
arXiv Detail & Related papers (2024-11-06T03:30:46Z)
SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution [73.46167948298041]
We propose a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. It can generate high-resolution images that improve the performance of downstream tasks by 1.7%-7%.
arXiv Detail & Related papers (2023-09-30T15:23:30Z)
Toward Sufficient Spatial-Frequency Interaction for Gradient-aware Underwater Image Enhancement [5.553172974022233]
We develop a novel Underwater image enhancement (UIE) framework based on spatial-frequency interaction and gradient maps. Experimental results on two real-world underwater image datasets show that our approach can successfully enhance underwater images.
arXiv Detail & Related papers (2023-09-08T02:58:17Z)
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z)
Learning an Efficient Multimodal Depth Completion Model [11.740546882538142]
RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems. The proposed method can outperform some state-of-the-art methods with a lightweight architecture. The method also wins the championship in the MIPI2022 RGB+TOF depth completion challenge.
arXiv Detail & Related papers (2022-08-23T07:03:14Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM. We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline. Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.