Related papers: Improving the Efficiency of Transformers for Resource-Constrained Devices

Improving the Efficiency of Transformers for Resource-Constrained Devices

URL: http://arxiv.org/abs/2106.16006v1
Date: Wed, 30 Jun 2021 12:10:48 GMT
Title: Improving the Efficiency of Transformers for Resource-Constrained Devices
Authors: Hamid Tabani, Ajay Balasubramaniam, Shabbir Marzban, Elahe Arani, Bahram Zonooz
Abstract summary: We present a performance analysis of state-of-the-art vision transformers on several devices. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x.
Score: 1.3019517863608956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss.

Related papers

PQV-Mobile: A Combined Pruning and Quantization Toolkit to Optimize Vision Transformers for Mobile Applications [0.0]
This paper presents a combined pruning and quantization tool, called PQV-Mobile, to optimize vision transformers for mobile applications. The tool is able to support different types of structured pruning based on magnitude importance, Taylor importance, and Hessian importance. We show important latency-memory-accuracy trade-offs for different amounts of pruning and int8 quantization with Facebook Data Efficient Image Transformer (DeiT) models.
arXiv Detail & Related papers (2024-08-15T22:10:10Z)
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design [5.962184741057505]
This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance. We introduce SHViT, a Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff.
arXiv Detail & Related papers (2024-01-29T09:12:23Z)
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory. Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z)
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition [19.772585241974138]
Streaming speech recognition models usually process a limited number of tokens each time. bottleneck lies in the linear projection layers of multi-head attention and feedforward networks. We propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency.
arXiv Detail & Related papers (2023-09-14T19:01:08Z)
Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer. We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters. Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z)
Reversible Vision Transformers [74.3500977090597]
Reversible Vision Transformers are a memory efficient architecture for visual recognition. We adapt two popular models, namely Vision Transformer and Multiscale Vision Transformers, to reversible variants. We find that the additional computational burden of recomputing activations is more than overcome for deeper models.
arXiv Detail & Related papers (2023-02-09T18:59:54Z)
Rethinking Vision Transformers for MobileNet Size and Speed [58.01406896628446]
We propose a novel supernet with low latency and high parameter efficiency. We also introduce a novel fine-grained joint search strategy for transformer models. This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed.
arXiv Detail & Related papers (2022-12-15T18:59:12Z)
AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z)
Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers. Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z)
Memory-efficient Speech Recognition on Smart Devices [15.015948023187809]
Recurrent transducer models have emerged as a promising solution for speech recognition on smart devices. These models access parameters from off-chip memory for every input time step which adversely effects device battery life and limits their usability on low-power devices. We address transducer model's memory access concerns by optimizing their model architecture and designing novel recurrent cell designs.
arXiv Detail & Related papers (2021-02-23T07:43:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.