DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation
- URL: http://arxiv.org/abs/2309.09668v2
- Date: Wed, 7 Feb 2024 11:07:29 GMT
- Title: DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation
- Authors: Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin
Hou
- Abstract summary: DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks.
It pretrains the backbone using image-depth pairs from ImageNet-1K.
DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
- Score: 76.81628995237058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DFormer, a novel RGB-D pretraining framework to learn transferable
representations for RGB-D segmentation tasks. DFormer has two new key
innovations: 1) Unlike previous works that encode RGB-D information with RGB
pretrained backbone, we pretrain the backbone using image-depth pairs from
ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D
representations; 2) DFormer comprises a sequence of RGB-D blocks, which are
tailored for encoding both RGB and depth information through a novel building
block design. DFormer avoids the mismatched encoding of the 3D geometry
relationships in depth maps by RGB pretrained backbones, which widely lies in
existing methods but has not been resolved. We finetune the pretrained DFormer
on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient
object detection, with a lightweight decoder head. Experimental results show
that our DFormer achieves new state-of-the-art performance on these two tasks
with less than half of the computational cost of the current best methods on
two RGB-D semantic segmentation datasets and five RGB-D salient object
detection datasets. Our code is available at:
https://github.com/VCIP-RGBD/DFormer.
Related papers
- PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised
RGB-D Point Cloud Registration [6.030097207369754]
We propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images.
Our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-08-09T08:13:46Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.