Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks
- URL: http://arxiv.org/abs/2303.15710v1
- Date: Tue, 28 Mar 2023 03:37:27 GMT
- Title: Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks
- Authors: Mingjian Liang, Junjie Hu, Chenyu Bao, Hua Feng, Fuqin Deng and Tin
Lun Lam
- Abstract summary: We propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data.
The proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting.
- Score: 13.742299383836256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, RGB-Thermal based perception has shown significant advances.
Thermal information provides useful clues when visual cameras suffer from poor
lighting conditions, such as low light and fog. However, how to effectively
fuse RGB images and thermal data remains an open challenge. Previous works
involve naive fusion strategies such as merging them at the input,
concatenating multi-modality features inside models, or applying attention to
each data modality. These fusion strategies are straightforward yet
insufficient. In this paper, we propose a novel fusion method named Explicit
Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of
data. Specifically, we consider the following cases: i) both RGB data and
thermal data, ii) only one of the types of data, and iii) none of them generate
discriminative features. EAEF uses one branch to enhance feature extraction for
i) and iii) and the other branch to remedy insufficient representations for
ii). The outputs of two branches are fused to form complementary features. As a
result, the proposed fusion method outperforms state-of-the-art by 1.6\% in
mIoU on semantic segmentation, 3.1\% in MAE on salient object detection, 2.3\%
in mAP on object detection, and 8.1\% in MAE on crowd counting. The code is
available at https://github.com/FreeformRobotics/EAEFNet.
Related papers
- Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion [15.538174593176166]
In this study, we explore a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing.
Specifically, we design a hybrid, asymmetric encoder that incorporates both a VFM and a convolutional neural network.
This design allows for more effective extraction of complementary heterogeneous features, which are subsequently fused in a dual-path, progressive manner.
arXiv Detail & Related papers (2024-04-04T15:31:11Z) - Complementary Random Masking for RGB-Thermal Semantic Segmentation [63.93784265195356]
RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions.
This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities.
We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-30T13:57:21Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting [16.336401175470197]
We propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting.
Experiment results on RGBT-CC dataset show that our method achieves more than 20% improvement on mean average error.
arXiv Detail & Related papers (2022-02-17T08:43:10Z) - Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing [4.913013713982677]
We propose an edge-aware guidance fusion network (EGFNet) for RGB thermal scene parsing.
To effectively fuse the RGB and thermal information, we propose a multimodal fusion module.
Considering the importance of high level semantic information, we propose a global information module and a semantic information module.
arXiv Detail & Related papers (2021-12-09T01:12:47Z) - Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot
Video Object Segmentation [86.94578023985677]
We propose a novel multi-source fusion network for zero-shot video object segmentation.
The proposed model achieves compelling performance against the state-of-the-arts.
arXiv Detail & Related papers (2021-08-11T07:37:44Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.