Related papers: ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

URL: http://arxiv.org/abs/2308.07504v1
Date: Tue, 15 Aug 2023 00:02:10 GMT
Title: ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection
Authors: Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang
Abstract summary: A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism. The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
Score: 25.66305300362193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deffciency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion.

Related papers

COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection [9.913133285133998]
Single-modal object detection tasks often experience performance degradation when encountering diverse scenarios. multimodal object detection tasks can offer more comprehensive information about object features by integrating data from various modalities. In this paper, we propose a novel approach called the CrOss-Mamba interaction and Offset-guided fusion framework.
arXiv Detail & Related papers (2024-12-24T01:14:48Z)
SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network. In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z)
Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance. We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction. Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z)
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z)
Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities. We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module. Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z)
Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust. We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z)
MPI: Multi-receptive and Parallel Integration for Salient Object Detection [17.32228882721628]
The semantic representation of deep features is essential for image context understanding. In this paper, a novel method called MPI is proposed for salient object detection. The proposed method outperforms state-of-the-art methods under different evaluation metrics.
arXiv Detail & Related papers (2021-08-08T12:01:44Z)
Centralized Information Interaction for Salient Object Detection [68.8587064889475]
The U-shape structure has shown its advantage in salient object detection for efficiently combining multi-scale features. This paper shows that by centralizing these connections, we can achieve the cross-scale information interaction among them. Our approach can cooperate with various existing U-shape-based salient object detection methods by substituting the connections between the bottom-up and top-down pathways.
arXiv Detail & Related papers (2020-12-21T12:42:06Z)
Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks [3.6488662460683794]
We propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features. We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection.
arXiv Detail & Related papers (2020-09-26T18:39:05Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.