Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation
- URL: http://arxiv.org/abs/2306.10364v1
- Date: Sat, 17 Jun 2023 14:28:08 GMT
- Title: Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation
- Authors: Ping Li and Junjie Chen and Binbin Lin and Xianghua Xu
- Abstract summary: Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness.
Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation.
This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
- Score: 19.41334573257174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation plays an important role in widespread applications such
as autonomous driving and robotic sensing. Traditional methods mostly use RGB
images which are heavily affected by lighting conditions, \eg, darkness. Recent
studies show thermal images are robust to the night scenario as a compensating
modality for segmentation. However, existing works either simply fuse
RGB-Thermal (RGB-T) images or adopt the encoder with the same structure for
both the RGB stream and the thermal stream, which neglects the modality
difference in segmentation under varying lighting conditions. Therefore, this
work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic
segmentation. Specifically, we employ an asymmetric encoder to learn the
compensating features of the RGB and the thermal images. To effectively fuse
the dual-modality features, we generate the pseudo-labels by saliency detection
to supervise the feature learning, and develop the Residual Spatial Fusion
(RSF) module with structural re-parameterization to learn more promising
features by spatially fusing the cross-modality features. RSF employs a
hierarchical feature fusion to aggregate multi-level features, and applies the
spatial weights with the residual connection to adaptively control the
multi-spectral feature fusion by the confidence gate. Extensive experiments
were carried out on two benchmarks, \ie, MFNet database and PST900 database.
The results have shown the state-of-the-art segmentation performance of our
method, which achieves a good balance between accuracy and speed.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency
Detection [10.589062261564631]
RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments.
Existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features.
We first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions.
arXiv Detail & Related papers (2023-09-13T20:47:29Z) - Channel and Spatial Relation-Propagation Network for RGB-Thermal
Semantic Segmentation [10.344060599932185]
RGB-Thermal (RGB-T) semantic segmentation has shown great potential in handling low-light conditions.
The key to RGB-T semantic segmentation is to effectively leverage the complementarity nature of RGB and thermal images.
arXiv Detail & Related papers (2023-08-24T03:43:47Z) - Hyperspectral Image Super Resolution with Real Unaligned RGB Guidance [11.711656319221072]
We propose an HSI fusion network with heterogenous feature extractions, multi-stage feature alignments, and attentive feature fusion.
Our method obtains a clear improvement over existing single-image and fusion-based super-resolution methods on quantitative assessment as well as visual comparison.
arXiv Detail & Related papers (2023-02-13T11:56:45Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task.
In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image.
On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.