Faster Attention Is What You Need: A Fast Self-Attention Neural Network
Backbone Architecture for the Edge via Double-Condensing Attention Condensers
- URL: http://arxiv.org/abs/2208.06980v1
- Date: Mon, 15 Aug 2022 02:47:33 GMT
- Title: Faster Attention Is What You Need: A Fast Self-Attention Neural Network
Backbone Architecture for the Edge via Double-Condensing Attention Condensers
- Authors: Alexander Wong, Mohammad Javad Shafiee, Saad Abbasi, Saeejith Nair,
and Mahmoud Famouri
- Abstract summary: We introduce a new faster attention condenser design called double-condensing attention condensers.
The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor.
These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
- Score: 71.40595908386477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing adoption of deep learning for on-device TinyML applications,
there has been an ever-increasing demand for more efficient neural network
backbones optimized for the edge. Recently, the introduction of attention
condenser networks have resulted in low-footprint, highly-efficient,
self-attention neural networks that strike a strong balance between accuracy
and speed. In this study, we introduce a new faster attention condenser design
called double-condensing attention condensers that enable more condensed
feature embedding. We further employ a machine-driven design exploration
strategy that imposes best practices design constraints for greater efficiency
and robustness to produce the macro-micro architecture constructs of the
backbone. The resulting backbone (which we name AttendNeXt) achieves
significantly higher inference throughput on an embedded ARM processor when
compared to several other state-of-the-art efficient backbones (>10X faster
than FB-Net C at higher accuracy and speed) while having a small model size
(>1.47X smaller than OFA-62 at higher speed and similar accuracy) and strong
accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher
speed). These promising results demonstrate that exploring different efficient
architecture designs and self-attention mechanisms can lead to interesting new
building blocks for TinyML applications.
Related papers
- LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones [10.435069781620957]
Research in efficient vision backbones is evolving into models that are a mixture of convolutions and transformer blocks.
We analyze common modules and architectural design choices for backbones not in terms of MACs, but rather in actual throughput and latency.
We combine both macro and micro design to create a new family of hardware-efficient backbone networks called LowFormer.
arXiv Detail & Related papers (2024-09-05T12:18:32Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - AttendSeg: A Tiny Attention Condenser Neural Network for Semantic
Segmentation on the Edge [71.80459780697956]
We introduce textbfAttendSeg, a low-precision, highly compact deep neural network tailored for on-device semantic segmentation.
AttendSeg possesses a self-attention network architecture comprising of light-weight attention condensers for improved spatial-channel selective attention.
arXiv Detail & Related papers (2021-04-29T19:19:04Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via
Visual Attention Condensers [81.17461895644003]
We introduce AttendNets, low-precision, highly compact deep neural networks tailored for on-device image recognition.
AttendNets possess deep self-attention architectures based on visual attention condensers.
Results show AttendNets have significantly lower architectural and computational complexity when compared to several deep neural networks.
arXiv Detail & Related papers (2020-09-30T01:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.