Global-Local Propagation Network for RGB-D Semantic Segmentation
- URL: http://arxiv.org/abs/2101.10801v1
- Date: Tue, 26 Jan 2021 14:26:07 GMT
- Title: Global-Local Propagation Network for RGB-D Semantic Segmentation
- Authors: Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, Jing Liu
- Abstract summary: We propose Global-Local propagation network (GLPNet) to solve this problem.
Our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets.
- Score: 12.710923449138434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth information matters in RGB-D semantic segmentation task for providing
additional geometric information to color images. Most existing methods exploit
a multi-stage fusion strategy to propagate depth feature to the RGB branch.
However, at the very deep stage, the propagation in a simple element-wise
addition manner can not fully utilize the depth information. We propose
Global-Local propagation network (GLPNet) to solve this problem. Specifically,
a local context fusion module(L-CFM) is introduced to dynamically align both
modalities before element-wise fusion, and a global context fusion
module(G-CFM) is introduced to propagate the depth information to the RGB
branch by jointly modeling the multi-modal global context features. Extensive
experiments demonstrate the effectiveness and complementarity of the proposed
fusion modules. Embedding two fusion modules into a two-stream encoder-decoder
structure, our GLPNet achieves new state-of-the-art performance on two
challenging indoor scene segmentation datasets, i.e., NYU-Depth v2 and SUN-RGBD
dataset.
Related papers
- Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - DCANet: Differential Convolution Attention Network for RGB-D Semantic
Segmentation [2.2032272277334375]
We propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data.
We extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies.
A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data.
arXiv Detail & Related papers (2022-10-13T05:17:34Z) - Learning an Efficient Multimodal Depth Completion Model [11.740546882538142]
RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems.
The proposed method can outperform some state-of-the-art methods with a lightweight architecture.
The method also wins the championship in the MIPI2022 RGB+TOF depth completion challenge.
arXiv Detail & Related papers (2022-08-23T07:03:14Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color
Semantic Segmentation [1.6758573326215689]
We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities.
We combine two strategies for feature fusion: confidence weighting and correlation weighting.
We report state-of-the-art mean IoU results on the MF dataset.
arXiv Detail & Related papers (2022-04-21T17:06:57Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.