Does Thermal Really Always Matter for RGB-T Salient Object Detection?
- URL: http://arxiv.org/abs/2210.04266v1
- Date: Sun, 9 Oct 2022 13:50:12 GMT
- Title: Does Thermal Really Always Matter for RGB-T Salient Object Detection?
- Authors: Runmin Cong, Kepu Zhang, Chen Zhang, Feng Zheng, Yao Zhao, Qingming
Huang, and Sam Kwong
- Abstract summary: This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task.
In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image.
On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
- Score: 153.17156598262656
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, RGB-T salient object detection (SOD) has attracted
continuous attention, which makes it possible to identify salient objects in
environments such as low light by introducing thermal image. However, most of
the existing RGB-T SOD models focus on how to perform cross-modality feature
fusion, ignoring whether thermal image is really always matter in SOD task.
Starting from the definition and nature of this task, this paper rethinks the
connotation of thermal modality, and proposes a network named TNet to solve the
RGB-T SOD task. In this paper, we introduce a global illumination estimation
module to predict the global illuminance score of the image, so as to regulate
the role played by the two modalities. In addition, considering the role of
thermal modality, we set up different cross-modality interaction mechanisms in
the encoding phase and the decoding phase. On the one hand, we introduce a
semantic constraint provider to enrich the semantics of thermal images in the
encoding phase, which makes thermal modality more suitable for the SOD task. On
the other hand, we introduce a two-stage localization and complementation
module in the decoding phase to transfer object localization cue and internal
integrity cue in thermal features to the RGB modality. Extensive experiments on
three datasets show that the proposed TNet achieves competitive performance
compared with 20 state-of-the-art methods.
Related papers
- TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small
Target Detection [58.00308680221481]
Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas.
Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective.
We propose Thermal Conduction-Inspired Transformer (TCI-Former) based on the theoretical principles of thermal conduction.
arXiv Detail & Related papers (2024-02-03T05:51:22Z) - Channel and Spatial Relation-Propagation Network for RGB-Thermal
Semantic Segmentation [10.344060599932185]
RGB-Thermal (RGB-T) semantic segmentation has shown great potential in handling low-light conditions.
The key to RGB-T semantic segmentation is to effectively leverage the complementarity nature of RGB and thermal images.
arXiv Detail & Related papers (2023-08-24T03:43:47Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness.
Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation.
This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Exploring Thermal Images for Object Detection in Underexposure Regions
for Autonomous Driving [67.69430435482127]
Underexposure regions are vital to construct a complete perception of the surroundings for safe autonomous driving.
The availability of thermal cameras has provided an essential alternate to explore regions where other optical sensors lack in capturing interpretable signals.
This work proposes a domain adaptation framework which employs a style transfer technique for transfer learning from visible spectrum images to thermal images.
arXiv Detail & Related papers (2020-06-01T09:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.