ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection
- URL: http://arxiv.org/abs/2308.07504v1
- Date: Tue, 15 Aug 2023 00:02:10 GMT
- Title: ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection
- Authors: Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang
- Abstract summary: A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction.
This framework enhances the discriminability of object features through the query-guided cross-attention mechanism.
The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
- Score: 25.66305300362193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective feature fusion of multispectral images plays a crucial role in
multi-spectral object detection. Previous studies have demonstrated the
effectiveness of feature fusion using convolutional neural networks, but these
methods are sensitive to image misalignment due to the inherent deffciency in
local-range feature interaction resulting in the performance degradation. To
address this issue, a novel feature fusion framework of dual cross-attention
transformers is proposed to model global feature interaction and capture
complementary information across modalities simultaneously. This framework
enhances the discriminability of object features through the query-guided
cross-attention mechanism, leading to improved performance. However, stacking
multiple transformer blocks for feature enhancement incurs a large number of
parameters and high spatial complexity. To handle this, inspired by the human
process of reviewing knowledge, an iterative interaction mechanism is proposed
to share parameters among block-wise multimodal transformers, reducing model
complexity and computation cost. The proposed method is general and effective
to be integrated into different detection frameworks and used with different
backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that
the proposed method achieves superior performance and faster inference, making
it suitable for various practical scenarios. Code will be available at
https://github.com/chanchanchan97/ICAFusion.
Related papers
- SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors.
Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network.
In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust.
We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z) - MPI: Multi-receptive and Parallel Integration for Salient Object
Detection [17.32228882721628]
The semantic representation of deep features is essential for image context understanding.
In this paper, a novel method called MPI is proposed for salient object detection.
The proposed method outperforms state-of-the-art methods under different evaluation metrics.
arXiv Detail & Related papers (2021-08-08T12:01:44Z) - Centralized Information Interaction for Salient Object Detection [68.8587064889475]
The U-shape structure has shown its advantage in salient object detection for efficiently combining multi-scale features.
This paper shows that by centralizing these connections, we can achieve the cross-scale information interaction among them.
Our approach can cooperate with various existing U-shape-based salient object detection methods by substituting the connections between the bottom-up and top-down pathways.
arXiv Detail & Related papers (2020-12-21T12:42:06Z) - Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine
Blocks [3.6488662460683794]
We propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features.
We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection.
arXiv Detail & Related papers (2020-09-26T18:39:05Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.