DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization
- URL: http://arxiv.org/abs/2505.20041v1
- Date: Mon, 26 May 2025 14:26:31 GMT
- Title: DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization
- Authors: Jianxin Huang, Jiahang Li, Sergey Vityazev, Alexander Dvorkovich, Rui Fan,
- Abstract summary: We introduce DepthMatch, a semi-supervised learning framework that is specifically designed for RGB-D scene parsing.<n>We propose complementary patch mix-up augmentation to explore the latent relationships between texture and spatial features in RGB-D image pairs.<n>We also design a lightweight spatial prior injector to replace traditional complex fusion modules, improving the efficiency of heterogeneous feature fusion.
- Score: 43.974708665104565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-D scene parsing methods effectively capture both semantic and geometric features of the environment, demonstrating great potential under challenging conditions such as extreme weather and low lighting. However, existing RGB-D scene parsing methods predominantly rely on supervised training strategies, which require a large amount of manually annotated pixel-level labels that are both time-consuming and costly. To overcome these limitations, we introduce DepthMatch, a semi-supervised learning framework that is specifically designed for RGB-D scene parsing. To make full use of unlabeled data, we propose complementary patch mix-up augmentation to explore the latent relationships between texture and spatial features in RGB-D image pairs. We also design a lightweight spatial prior injector to replace traditional complex fusion modules, improving the efficiency of heterogeneous feature fusion. Furthermore, we introduce depth-guided boundary loss to enhance the model's boundary prediction capabilities. Experimental results demonstrate that DepthMatch exhibits high applicability in both indoor and outdoor scenes, achieving state-of-the-art results on the NYUv2 dataset and ranking first on the KITTI Semantics benchmark.
Related papers
- PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes [6.698379291727345]
Pseudo Depth (PD) from high-precision depth estimation algorithms can eliminate the dependence on RGB-D sensors and alignment processes.<n>PD shows significant potential in semantic segmentation.<n>PD aggregates multiple pseudo depth maps into a single modality.<n>PD achieves state-of-the-art performance, outperforming other methods by +6.98 mIoU on NYUv2 and +2.11 mIoU on SUNRGB-D.
arXiv Detail & Related papers (2025-03-24T07:05:31Z) - Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - Depth-Guided Semi-Supervised Instance Segmentation [62.80063539262021]
Semi-Supervised Instance (SSIS) aims to leverage an amount of unlabeled data during training.
Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels.
We introduce a Depth-Guided (DG) framework to overcome this limitation.
arXiv Detail & Related papers (2024-06-25T09:36:50Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Spherical Space Feature Decomposition for Guided Depth Map
Super-Resolution [123.04455334124188]
Guided depth map super-resolution (GDSR) aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene.
In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues.
Our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes.
arXiv Detail & Related papers (2023-03-15T21:22:21Z) - SpiderMesh: Spatial-aware Demand-guided Recursive Meshing for RGB-T
Semantic Segmentation [13.125707028339292]
We propose a Spatial-aware Demand-guided Recursive Meshing (SpiderMesh) framework for practical RGB-T (thermal) segmentation.
SpiderMesh proactively compensates inadequate contextual semantics in optically-impaired regions.
Experiments on MFNet and PST900 datasets demonstrate that SpiderMesh achieves state-of-the-art performance on standard RGB-T segmentation benchmarks.
arXiv Detail & Related papers (2023-03-15T15:24:01Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.