Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention
- URL: http://arxiv.org/abs/2311.11312v2
- Date: Wed, 6 Dec 2023 07:30:40 GMT
- Title: Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention
- Authors: Shuai Zhang, Minghong Xie
- Abstract summary: Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
- Score: 5.518612382697244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation of RGB-D images involves understanding the appearance
and spatial relationships of objects within a scene, which requires careful
consideration of various factors. However, in indoor environments, the simple
input of RGB and depth images often results in a relatively limited acquisition
of semantic and spatial information, leading to suboptimal segmentation
outcomes. To address this, we propose the Multi-modal Interaction and Pooling
Attention Network (MIPANet), a novel approach designed to harness the
interactive synergy between RGB and depth modalities, optimizing the
utilization of complementary information. Specifically, we incorporate a
Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the
network. This module is engineered to facilitate the fusion of RGB and depth
information, allowing for mutual enhancement and correction. Additionally, we
introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and
integrates the module's output into the decoder in a targeted manner,
significantly improving semantic segmentation performance. Our experimental
results demonstrate that MIPANet outperforms existing methods on two indoor
scene datasets, NYUDv2 and SUN-RGBD, underscoring its effectiveness in
enhancing RGB-D semantic segmentation.
Related papers
- Context-Aware Interaction Network for RGB-T Semantic Segmentation [12.91377211747192]
RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding.
We propose a Context-Aware Interaction Network (CAINet) to exploit auxiliary tasks and global context for guided learning.
The proposed CAINet achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2024-01-03T08:49:29Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Position-Aware Relation Learning for RGB-Thermal Salient Object
Detection [3.115635707192086]
We propose a position-aware relation learning network (PRLNet) for RGB-T SOD based on swin transformer.
PRLNet explores the distance and direction relationships between pixels to strengthen intra-class compactness and inter-class separation.
In addition, we constitute a pure transformer encoder-decoder network to enhance multispectral feature representation for RGB-T SOD.
arXiv Detail & Related papers (2022-09-21T07:34:30Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Global-Local Propagation Network for RGB-D Semantic Segmentation [12.710923449138434]
We propose Global-Local propagation network (GLPNet) to solve this problem.
Our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets.
arXiv Detail & Related papers (2021-01-26T14:26:07Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - Multi-level Cross-modal Interaction Network for RGB-D Salient Object
Detection [3.581367375462018]
We propose a novel Multi-level Cross-modal Interaction Network (MCINet) for RGB-D based salient object detection (SOD)
Our MCI-Net includes two key components: 1) a cross-modal feature learning network, which is used to learn the high-level features for the RGB images and depth cues, effectively enabling the correlations between the two sources to be exploited; and 2) a multi-level interactive integration network, which integrates multi-level cross-modal features to boost the SOD performance.
arXiv Detail & Related papers (2020-07-10T02:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.