Attentional Feature Fusion
- URL: http://arxiv.org/abs/2009.14082v2
- Date: Mon, 9 Nov 2020 17:41:20 GMT
- Title: Attentional Feature Fusion
- Authors: Yimian Dai and Fabian Gieseke and Stefan Oehmcke and Yiquan Wu and
Kobus Barnard
- Abstract summary: We propose a uniform and general scheme, namely attentional feature fusion.
We show that our models outperform state-of-the-art networks on both CIFAR-100 and ImageNet datasets.
- Score: 4.265244011052538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature fusion, the combination of features from different layers or
branches, is an omnipresent part of modern network architectures. It is often
implemented via simple operations, such as summation or concatenation, but this
might not be the best choice. In this work, we propose a uniform and general
scheme, namely attentional feature fusion, which is applicable for most common
scenarios, including feature fusion induced by short and long skip connections
as well as within Inception layers. To better fuse features of inconsistent
semantics and scales, we propose a multi-scale channel attention module, which
addresses issues that arise when fusing features given at different scales. We
also demonstrate that the initial integration of feature maps can become a
bottleneck and that this issue can be alleviated by adding another level of
attention, which we refer to as iterative attentional feature fusion. With
fewer layers or parameters, our models outperform state-of-the-art networks on
both CIFAR-100 and ImageNet datasets, which suggests that more sophisticated
attention mechanisms for feature fusion hold great potential to consistently
yield better results compared to their direct counterparts. Our codes and
trained models are available online.
Related papers
- Attention-Guided Multi-scale Interaction Network for Face Super-Resolution [46.42710591689621]
CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks.
How to fuse multi-scale features and promote their complementarity is crucial for enhancing FSR.
Our design allows the free flow of multi-scale features from within modules and between encoder and decoder.
arXiv Detail & Related papers (2024-09-01T02:53:24Z) - SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare.
Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations.
This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z) - ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection [25.66305300362193]
A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction.
This framework enhances the discriminability of object features through the query-guided cross-attention mechanism.
The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
arXiv Detail & Related papers (2023-08-15T00:02:10Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - CAT: Learning to Collaborate Channel and Spatial Attention from
Multi-Information Fusion [23.72040577828098]
We propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions.
Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules.
Our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification.
arXiv Detail & Related papers (2022-12-13T02:34:10Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - MA-Unet: An improved version of Unet based on multi-scale and attention
mechanism for medical image segmentation [4.082245106486775]
convolutional neural networks (CNNs) are promoting the development of medical image semantic segmentation.
In this paper, we try to eliminate semantic ambiguity in skip connection operations by adding attention gates (AGs)
Our model obtains better segmentation performance while introducing fewer parameters.
arXiv Detail & Related papers (2020-12-20T15:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.