DCANet: Differential Convolution Attention Network for RGB-D Semantic
Segmentation
- URL: http://arxiv.org/abs/2210.06747v1
- Date: Thu, 13 Oct 2022 05:17:34 GMT
- Title: DCANet: Differential Convolution Attention Network for RGB-D Semantic
Segmentation
- Authors: Lizhi Bai and Jun Yang and Chunqi Tian and Yaoru Sun and Maoyu Mao and
Yanjun Xu and Weirong Xu
- Abstract summary: We propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data.
We extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies.
A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data.
- Score: 2.2032272277334375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining RGB images and the corresponding depth maps in semantic
segmentation proves the effectiveness in the past few years. Existing RGB-D
modal fusion methods either lack the non-linear feature fusion ability or treat
both modal images equally, regardless of the intrinsic distribution gap or
information loss. Here we find that depth maps are suitable to provide
intrinsic fine-grained patterns of objects due to their local depth continuity,
while RGB images effectively provide a global view. Based on this, we propose a
pixel differential convolution attention (DCA) module to consider geometric
information and local-range correlations for depth data. Furthermore, we extend
DCA to ensemble differential convolution attention (EDCA) which propagates
long-range contextual dependencies and seamlessly incorporates spatial
distribution for RGB data. DCA and EDCA dynamically adjust convolutional
weights by pixel difference to enable self-adaptive in local and long range,
respectively. A two-branch network built with DCA and EDCA, called Differential
Convolutional Network (DCANet), is proposed to fuse local and global
information of two-modal data. Consequently, the individual advantage of RGB
and depth data are emphasized. Our DCANet is shown to set a new
state-of-the-art performance for RGB-D semantic segmentation on two challenging
benchmark datasets, i.e., NYUDv2 and SUN-RGBD.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - The Devil is in the Details: Boosting Guided Depth Super-Resolution via
Rethinking Cross-Modal Alignment and Aggregation [41.12790340577986]
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene.
Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection.
arXiv Detail & Related papers (2024-01-16T05:37:08Z) - Pixel Difference Convolutional Network for RGB-D Semantic Segmentation [2.334574428469772]
RGB-D semantic segmentation can be advanced with convolutional neural networks due to the availability of Depth data.
Considering the fixed grid kernel structure, CNNs are limited to the ability to capture detailed, fine-grained information.
We propose a Pixel Difference Convolutional Network (PDCNet) to capture detailed intrinsic patterns by aggregating both intensity and gradient information.
arXiv Detail & Related papers (2023-02-23T12:01:22Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.