A survey on efficient vision transformers: algorithms, techniques, and
performance benchmarking
- URL: http://arxiv.org/abs/2309.02031v2
- Date: Tue, 12 Mar 2024 10:33:20 GMT
- Title: A survey on efficient vision transformers: algorithms, techniques, and
performance benchmarking
- Authors: Lorenzo Papa, Paolo Russo, Irene Amerini, and Luping Zhou
- Abstract summary: Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications.
This paper mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios.
- Score: 19.65897437342896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformer (ViT) architectures are becoming increasingly popular and
widely employed to tackle computer vision applications. Their main feature is
the capacity to extract global information through the self-attention
mechanism, outperforming earlier convolutional neural networks. However, ViT
deployment and performance have grown steadily with their size, number of
trainable parameters, and operations. Furthermore, self-attention's
computational and memory cost quadratically increases with the image
resolution. Generally speaking, it is challenging to employ these architectures
in real-world applications due to many hardware and environmental restrictions,
such as processing and computational capabilities. Therefore, this survey
investigates the most efficient methodologies to ensure sub-optimal estimation
performances. More in detail, four efficient categories will be analyzed:
compact architecture, pruning, knowledge distillation, and quantization
strategies. Moreover, a new metric called Efficient Error Rate has been
introduced in order to normalize and compare models' features that affect
hardware devices at inference time, such as the number of parameters, bits,
FLOPs, and model size. Summarizing, this paper firstly mathematically defines
the strategies used to make Vision Transformer efficient, describes and
discusses state-of-the-art methodologies, and analyzes their performances over
different application scenarios. Toward the end of this paper, we also discuss
open challenges and promising research directions.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - big.LITTLE Vision Transformer for Efficient Visual Recognition [34.015778625984055]
big.LITTLE Vision Transformer is an innovative architecture aimed at achieving efficient visual recognition.
System is composed of two distinct blocks: the big performance block and the LITTLE efficiency block.
When processing an image, our system determines the importance of each token and allocates them accordingly.
arXiv Detail & Related papers (2024-10-14T08:21:00Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Comprehensive Survey of Model Compression and Speed up for Vision Transformers [5.592810604696031]
Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks.
However, their practical deployment is hampered by high computational and memory demands.
This study addresses the challenge by evaluating four primary model compression techniques.
arXiv Detail & Related papers (2024-04-16T09:19:11Z) - Can pruning make Large Language Models more efficient? [0.0]
This paper investigates the application of weight pruning as an optimization strategy for Transformer architectures.
Our findings suggest that significant reductions in model size are attainable without considerable compromise on performance.
This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications.
arXiv Detail & Related papers (2023-10-06T20:28:32Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Visualizing High-Dimensional Trajectories on the Loss-Landscape of ANNs [15.689418447376587]
Training artificial neural networks requires the optimization of highly non-dimensional loss functions.
Visualization tools have played a key role in uncovering key geometric characteristics of loss-landscape of ANNs.
We propose the modernity reduction method which represents the SOTA in terms both local and global structures.
arXiv Detail & Related papers (2021-01-31T16:30:50Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.