EMBANet: A Flexible Efffcient Multi-branch Attention Network
- URL: http://arxiv.org/abs/2407.05418v1
- Date: Sun, 7 Jul 2024 15:50:01 GMT
- Title: EMBANet: A Flexible Efffcient Multi-branch Attention Network
- Authors: Keke Zu, Hu Zhang, Jian Lu, Lei Zhang, Chen Xu,
- Abstract summary: This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map.
Two important transformation operators, multiplex and split, are considered in this work.
A new backbone network called EMBANet is established by stacking the EMBA blocks.
- Score: 12.372988694006262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are considered in this work, both of which can represent multi-scale features at a more granular level and increase the range of receptive fields. By integrating the MBC and attention module, a multi-branch attention (MBA) module is consequently developed to capture the channel-wise interaction of feature maps for establishing the long-range channel dependency. By substituting the 3x3 convolutions in the bottleneck blocks of the ResNet with the proposed MBA, a novel block namely efficient multi-branch attention (EMBA) is obtained, which can be easily plugged into the state-of-the-art backbone CNN models. Furthermore, a new backbone network called EMBANet is established by stacking the EMBA blocks. The proposed EMBANet is extensively evaluated on representative computer vision tasks including: classification, detection, and segmentation. And it demonstrates consistently superior performance over the popular backbones.
Related papers
- Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao [49.11242099449315]
We introduce a novel Multi-Branch Cooperation Network (MBCnet)
MBCnet consists of three branches: the Expert-based Feature Grouping and Crossing (EFGC), the low rank Cross Net branch and Deep branch.
Experiments on large-scale industrial datasets and online A/B test demonstrate MBCnet's superior performance, delivering a 0.09 point increase in CTR, 1.49% growth in deals, and 1.62% rise in GMV.
arXiv Detail & Related papers (2024-11-20T06:10:06Z) - CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation [0.508267104652645]
Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation.
We present a Convolution and self-attention-free Mamba-based semantic Network named CAMS-Net.
Our model outperforms the existing state-of-the-art CNN, self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentation datasets.
arXiv Detail & Related papers (2024-06-09T13:53:05Z) - MFPNet: Multi-scale Feature Propagation Network For Lightweight Semantic
Segmentation [5.58363644107113]
We propose a novel lightweight segmentation architecture, called Multi-scale Feature Propagation Network (Net)
We design a robust-Decoder structure featuring symmetrical residual blocks that consist of flexible bottleneck residual modules (BRMs)
Taking benefit of their capacity to model latent long-range contextual relationships, we leverage Graph Convolutional Networks (GCNs) to facilitate multiscale feature propagation between the BRM blocks.
arXiv Detail & Related papers (2023-09-10T02:02:29Z) - RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided
Learning [37.067605349559]
We propose a novel Progressive Fusion Transformer called ProFormer.
It integrates single-modality information into the multimodal representation for robust RGBT tracking.
ProFormer sets a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.
arXiv Detail & Related papers (2023-03-26T16:55:58Z) - Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale
Feature Fusion Approach [3.9548535445908928]
CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness.
Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation.
We propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder.
arXiv Detail & Related papers (2023-01-25T22:09:07Z) - DoubleU-NetPlus: A Novel Attention and Context Guided Dual U-Net with
Multi-Scale Residual Feature Fusion Network for Semantic Segmentation of
Medical Images [2.20200533591633]
We present a novel dual U-Net-based architecture named DoubleU-NetPlus.
We exploit multi-contextual features and several attention strategies to increase networks' ability to model discriminative feature representation.
To mitigate the gradient vanishing issue and incorporate high-resolution features with deeper spatial details, the standard convolution operation is replaced with the attention-guided residual convolution operations.
arXiv Detail & Related papers (2022-11-25T16:56:26Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with
Flow-Guided Attentive Correlation and Recursive Boosting [50.17500790309477]
DeMFI-Net is a joint deblurring and multi-frame framework.
It converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate.
It achieves state-of-the-art (SOTA) performances for diverse datasets.
arXiv Detail & Related papers (2021-11-19T00:00:15Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - Encoder Fusion Network with Co-Attention Embedding for Referring Image
Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network.
A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features.
The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z) - Diverse Branch Block: Building a Convolution as an Inception-like Unit [123.59890802196797]
We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs.
The Diverse Branch Block (DBB) enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities.
After training, a DBB can be equivalently converted into a single conv layer for deployment.
arXiv Detail & Related papers (2021-03-24T18:12:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.