Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
- URL: http://arxiv.org/abs/2007.09183v1
- Date: Fri, 17 Jul 2020 18:35:24 GMT
- Title: Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
- Authors: Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian,
Hongsheng Li, Gang Zeng
- Abstract summary: Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
- Score: 59.94819184452694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth information has proven to be a useful cue in the semantic segmentation
of RGB-D images for providing a geometric counterpart to the RGB
representation. Most existing works simply assume that depth measurements are
accurate and well-aligned with the RGB pixels and models the problem as a
cross-modal feature fusion to obtain better feature representations to achieve
more accurate segmentation. This, however, may not lead to satisfactory results
as actual depth data are generally noisy, which might worsen the accuracy as
the networks go deeper.
In this paper, we propose a unified and efficient Cross-modality Guided
Encoder to not only effectively recalibrate RGB feature responses, but also to
distill accurate depth information via multiple stages and aggregate the two
recalibrated representations alternatively. The key of the proposed
architecture is a novel Separation-and-Aggregation Gating operation that
jointly filters and recalibrates both representations before cross-modality
aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is
introduced, on the one hand, to help to propagate and fuse information between
the two modalities, and on the other hand, to preserve their specificity along
the long-term propagation process. Besides, our proposed encoder can be easily
injected into the previous encoder-decoder structures to boost their
performance on RGB-D semantic segmentation. Our model outperforms
state-of-the-arts consistently on both in-door and out-door challenging
datasets. Code of this work is available at https://charlescxk.github.io/
Related papers
- HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - DCANet: Differential Convolution Attention Network for RGB-D Semantic
Segmentation [2.2032272277334375]
We propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data.
We extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies.
A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data.
arXiv Detail & Related papers (2022-10-13T05:17:34Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - TANet: Transformer-based Asymmetric Network for RGB-D Salient Object
Detection [13.126051625000605]
RGB-D SOD methods mainly rely on a symmetric two-stream CNN-based network to extract RGB and depth channel features separately.
We propose a Transformer-based asymmetric network (TANet) to tackle the issues mentioned above.
Our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets.
arXiv Detail & Related papers (2022-07-04T03:06:59Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.