Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation
- URL: http://arxiv.org/abs/2201.01427v1
- Date: Wed, 5 Jan 2022 03:12:27 GMT
- Title: Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation
- Authors: Yang Zhang, Yang Yang, Chenyun Xiong, Guodong Sun, Yanwen Guo
- Abstract summary: We propose a novel attention-based dual supervised decoder for RGBD semantic segmentation.
In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information.
Our method achieves superior performance against the state-of-the-art methods.
- Score: 16.721758280029302
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Encoder-decoder models have been widely used in RGBD semantic segmentation,
and most of them are designed via a two-stream network. In general, jointly
reasoning the color and geometric information from RGBD is beneficial for
semantic segmentation. However, most existing approaches fail to
comprehensively utilize multimodal information in both the encoder and decoder.
In this paper, we propose a novel attention-based dual supervised decoder for
RGBD semantic segmentation. In the encoder, we design a simple yet effective
attention-based multimodal fusion module to extract and fuse deeply multi-level
paired complementary information. To learn more robust deep representations and
rich multi-modal information, we introduce a dual-branch decoder to effectively
leverage the correlations and complementary cues of different tasks. Extensive
experiments on NYUDv2 and SUN-RGBD datasets demonstrate that our method
achieves superior performance against the state-of-the-art methods.
Related papers
- SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval [82.51117533271517]
Previous works typically only encode RGB videos to obtain high-level semantic features.
Existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training.
We propose a novel sign language representation framework called Semantically Enhanced Dual-Stream.
arXiv Detail & Related papers (2024-07-23T11:31:11Z) - Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness [2.341385717236931]
We propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection.
Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies.
Our HiDAnet performs favorably over the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2023-01-18T10:00:59Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Robust Double-Encoder Network for RGB-D Panoptic Segmentation [31.807572107839576]
Panoptic segmentation provides an interpretation of the scene by computing a pixelwise semantic label together with instance IDs.
We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders.
We show that our approach achieves superior results compared to other common approaches for panoptic segmentation.
arXiv Detail & Related papers (2022-10-06T11:46:37Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.