Rethinking Mobile Block for Efficient Attention-based Models
- URL: http://arxiv.org/abs/2301.01146v4
- Date: Mon, 14 Aug 2023 08:54:43 GMT
- Title: Rethinking Mobile Block for Efficient Attention-based Models
- Authors: Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen
Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, and Chengjie Wang
- Abstract summary: This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.
Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterpart has been recognized by attention-based studies.
We extend CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMB) for lightweight model design.
- Score: 60.0312591342016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on developing modern, efficient, lightweight models for
dense predictions while trading off parameters, FLOPs, and performance.
Inverted Residual Block (IRB) serves as the infrastructure for lightweight
CNNs, but no counterpart has been recognized by attention-based studies. This
work rethinks lightweight infrastructure from efficient IRB and effective
components of Transformer from a unified perspective, extending CNN-based IRB
to attention-based models and abstracting a one-residual Meta Mobile Block
(MMB) for lightweight model design. Following simple but effective design
criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a
ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks.
Extensive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks
demonstrate the superiority of our EMO over state-of-the-art methods, e.g.,
EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass equal-order
CNN-/Attention-based models, while trading-off the parameter, efficiency, and
accuracy well: running 2.8-4.0x faster than EdgeNeXt on iPhone14.
Related papers
- LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection [0.0]
We focus on design choices of neural network architectures for efficient object detection based on FLOP.
We propose several optimizations to enhance the efficiency of YOLO-based models.
This paper contributes to a new scaling paradigm for object detection and YOLO-centric models called LeYOLO.
arXiv Detail & Related papers (2024-06-20T12:08:24Z) - Efficient Modulation for Vision Networks [122.1051910402034]
We propose efficient modulation, a novel design for efficient vision networks.
We demonstrate that the modulation mechanism is particularly well suited for efficient networks.
Our network can accomplish better trade-offs between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-29T03:48:35Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - CODEBench: A Neural Architecture and Hardware Accelerator Co-Design
Framework [4.5259990830344075]
This work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench.
It is composed of two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators.
arXiv Detail & Related papers (2022-12-07T21:38:03Z) - MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
Models [40.40784209977589]
This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention.
We replace a standard Transformer block with a mobile convolution block, and further reorder it before the self-attention operation.
Our conceptually simple MOAT networks are surprisingly effective, achieving 89.1% top-1 accuracy on ImageNet-1K with ImageNet-22K pretraining.
arXiv Detail & Related papers (2022-10-04T18:00:06Z) - Faster Attention Is What You Need: A Fast Self-Attention Neural Network
Backbone Architecture for the Edge via Double-Condensing Attention Condensers [71.40595908386477]
We introduce a new faster attention condenser design called double-condensing attention condensers.
The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor.
These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
arXiv Detail & Related papers (2022-08-15T02:47:33Z) - A Two-Stage Efficient 3-D CNN Framework for EEG Based Emotion
Recognition [3.147603836269998]
The framework consists of two stages; the first involves constructing efficient models named EEGNet.
In the second stage, we binarize these models to further compress them and deploy them easily on edge devices.
The proposed binarized EEGNet models achieve accuracies of 81%, 95%, and 99% with storage costs of 0.11Mbits, 0.28Mbits, and 0.46Mbits, respectively.
arXiv Detail & Related papers (2022-07-26T05:33:08Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Bottleneck Transformers for Visual Recognition [97.16013761605254]
We present BoTNet, a powerful backbone architecture that incorporates self-attention for vision tasks.
We present models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark.
We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision.
arXiv Detail & Related papers (2021-01-27T18:55:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.