ULSAM: Ultra-Lightweight Subspace Attention Module for Compact
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2006.15102v1
- Date: Fri, 26 Jun 2020 17:05:43 GMT
- Title: ULSAM: Ultra-Lightweight Subspace Attention Module for Compact
Convolutional Neural Networks
- Authors: Rajat Saini, Nandan Kumar Jha, Bedanta Das, Sparsh Mittal, C. Krishna
Mohan
- Abstract summary: "Ultra-Lightweight Subspace Attention Mechanism" (ULSAM) is end-to-end trainable and can be deployed as a plug-and-play module in compact convolutional neural networks (CNNs)
We achieve $approx$13% and $approx$25% reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27% and more than 1% improvement in top-1 accuracy on the ImageNet-1K and fine-grained image classification datasets (respectively)
- Score: 4.143032261649983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The capability of the self-attention mechanism to model the long-range
dependencies has catapulted its deployment in vision models. Unlike convolution
operators, self-attention offers infinite receptive field and enables
compute-efficient modeling of global dependencies. However, the existing
state-of-the-art attention mechanisms incur high compute and/or parameter
overheads, and hence unfit for compact convolutional neural networks (CNNs). In
this work, we propose a simple yet effective "Ultra-Lightweight Subspace
Attention Mechanism" (ULSAM), which infers different attention maps for each
feature map subspace. We argue that leaning separate attention maps for each
feature subspace enables multi-scale and multi-frequency feature
representation, which is more desirable for fine-grained image classification.
Our method of subspace attention is orthogonal and complementary to the
existing state-of-the-arts attention mechanisms used in vision models. ULSAM is
end-to-end trainable and can be deployed as a plug-and-play module in the
pre-existing compact CNNs. Notably, our work is the first attempt that uses a
subspace attention mechanism to increase the efficiency of compact CNNs. To
show the efficacy of ULSAM, we perform experiments with MobileNet-V1 and
MobileNet-V2 as backbone architectures on ImageNet-1K and three fine-grained
image classification datasets. We achieve $\approx$13% and $\approx$25%
reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27%
and more than 1% improvement in top-1 accuracy on the ImageNet-1K and
fine-grained image classification datasets (respectively). Code and trained
models are available at https://github.com/Nandan91/ULSAM.
Related papers
- PMFSNet: Polarized Multi-scale Feature Self-attention Network For
Lightweight Medical Image Segmentation [6.134314911212846]
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes.
We propose PMFSNet, a novel medical imaging segmentation model that balances global local feature processing while avoiding computational redundancy.
It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies.
arXiv Detail & Related papers (2024-01-15T10:26:47Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - An Attention Module for Convolutional Neural Networks [5.333582981327498]
We propose an attention module for convolutional neural networks by developing an AW-convolution.
Experiments on several datasets for image classification and object detection tasks show the effectiveness of our proposed attention module.
arXiv Detail & Related papers (2021-08-18T15:36:18Z) - DMSANet: Dual Multi Scale Attention Network [0.0]
We propose a new attention module that not only achieves the best performance but also has lesser parameters compared to most existing models.
Our attention module can easily be integrated with other convolutional neural networks because of its lightweight nature.
arXiv Detail & Related papers (2021-06-13T10:31:31Z) - SA-Net: Shuffle Attention for Deep Convolutional Neural Networks [0.0]
We propose an efficient Shuffle Attention (SA) module to address this issue.
The proposed SA module is efficient yet effective, e.g., the parameters and computations of SA against the backbone ResNet50 are 300 vs. 25.56M and 2.76e-3 GFLOPs vs. 4.12 GFLOPs, respectively.
arXiv Detail & Related papers (2021-01-30T15:23:17Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.