Channel and Spatial Relation-Propagation Network for RGB-Thermal
Semantic Segmentation
- URL: http://arxiv.org/abs/2308.12534v1
- Date: Thu, 24 Aug 2023 03:43:47 GMT
- Title: Channel and Spatial Relation-Propagation Network for RGB-Thermal
Semantic Segmentation
- Authors: Zikun Zhou, Shukun Wu, Guoqing Zhu, Hongpeng Wang, Zhenyu He
- Abstract summary: RGB-Thermal (RGB-T) semantic segmentation has shown great potential in handling low-light conditions.
The key to RGB-T semantic segmentation is to effectively leverage the complementarity nature of RGB and thermal images.
- Score: 10.344060599932185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-Thermal (RGB-T) semantic segmentation has shown great potential in
handling low-light conditions where RGB-based segmentation is hindered by poor
RGB imaging quality. The key to RGB-T semantic segmentation is to effectively
leverage the complementarity nature of RGB and thermal images. Most existing
algorithms fuse RGB and thermal information in feature space via concatenation,
element-wise summation, or attention operations in either unidirectional
enhancement or bidirectional aggregation manners. However, they usually
overlook the modality gap between RGB and thermal images during feature fusion,
resulting in modality-specific information from one modality contaminating the
other. In this paper, we propose a Channel and Spatial Relation-Propagation
Network (CSRPNet) for RGB-T semantic segmentation, which propagates only
modality-shared information across different modalities and alleviates the
modality-specific information contamination issue. Our CSRPNet first performs
relation-propagation in channel and spatial dimensions to capture the
modality-shared features from the RGB and thermal features. CSRPNet then
aggregates the modality-shared features captured from one modality with the
input feature from the other modality to enhance the input feature without the
contamination issue. While being fused together, the enhanced RGB and thermal
features will be also fed into the subsequent RGB or thermal feature extraction
layers for interactive feature fusion, respectively. We also introduce a
dual-path cascaded feature refinement module that aggregates multi-layer
features to produce two refined features for semantic and boundary prediction.
Extensive experimental results demonstrate that CSRPNet performs favorably
against state-of-the-art algorithms.
Related papers
- HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion [15.538174593176166]
In this study, we explore a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing.
Specifically, we design a hybrid, asymmetric encoder that incorporates both a VFM and a convolutional neural network.
This design allows for more effective extraction of complementary heterogeneous features, which are subsequently fused in a dual-path, progressive manner.
arXiv Detail & Related papers (2024-04-04T15:31:11Z) - Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency
Detection [10.589062261564631]
RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments.
Existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features.
We first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions.
arXiv Detail & Related papers (2023-09-13T20:47:29Z) - Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness.
Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation.
This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z) - Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task.
In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image.
On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.