Depth-Adapted CNNs for RGB-D Semantic Segmentation
- URL: http://arxiv.org/abs/2206.03939v1
- Date: Wed, 8 Jun 2022 14:59:40 GMT
- Title: Depth-Adapted CNNs for RGB-D Semantic Segmentation
- Authors: Zongwei Wu, Guillaume Allibert, Christophe Stolz, Chao Ma, and
C\'edric Demonceaux
- Abstract summary: We propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN)
Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images.
With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators.
- Score: 2.341385717236931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent RGB-D semantic segmentation has motivated research interest thanks to
the accessibility of complementary modalities from the input side. Existing
works often adopt a two-stream architecture that processes photometric and
geometric information in parallel, with few methods explicitly leveraging the
contribution of depth cues to adjust the sampling position on RGB images. In
this paper, we propose a novel framework to incorporate the depth information
in the RGB convolutional neural network (CNN), termed Z-ACN (Depth-Adapted
CNN). Specifically, our Z-ACN generates a 2D depth-adapted offset which is
fully constrained by low-level features to guide the feature extraction on RGB
images. With the generated offset, we introduce two intuitive and effective
operations to replace basic CNN operators: depth-adapted convolution and
depth-adapted average pooling. Extensive experiments on both indoor and outdoor
semantic segmentation tasks demonstrate the effectiveness of our approach.
Related papers
- CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - TANet: Transformer-based Asymmetric Network for RGB-D Salient Object
Detection [13.126051625000605]
RGB-D SOD methods mainly rely on a symmetric two-stream CNN-based network to extract RGB and depth channel features separately.
We propose a Transformer-based asymmetric network (TANet) to tackle the issues mentioned above.
Our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets.
arXiv Detail & Related papers (2022-07-04T03:06:59Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.