Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery
- URL: http://arxiv.org/abs/2112.02991v1
- Date: Mon, 6 Dec 2021 13:12:36 GMT
- Title: Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery
- Authors: Qingyun Fang, Zhaokui Wang
- Abstract summary: Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms.
We propose a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions.
Our proposed approach can achieve the state-of-the-art performance at a low cost.
- Score: 0.6853165736531939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-modality fusing complementary information of multispectral remote
sensing image pairs can improve the perception ability of detection algorithms,
making them more robust and reliable for a wider range of applications, such as
nighttime detection. Compared with prior methods, we think different features
should be processed specifically, the modality-specific features should be
retained and enhanced, while the modality-shared features should be
cherry-picked from the RGB and thermal IR modalities. Following this idea, a
novel and lightweight multispectral feature fusion approach with joint
common-modality and differential-modality attentions are proposed, named
Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature
maps of RGB and IR images, our module parallel infers attention maps from two
separate modalities, common- and differential-modality, then the attention maps
are multiplied to the input feature map respectively for adaptive feature
enhancement or selection. Extensive experiments demonstrate that our proposed
approach can achieve the state-of-the-art performance at a low computation
cost.
Related papers
- A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images.
We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.
Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection [25.66305300362193]
A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction.
This framework enhances the discriminability of object features through the query-guided cross-attention mechanism.
The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
arXiv Detail & Related papers (2023-08-15T00:02:10Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.