LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
- URL: http://arxiv.org/abs/2104.01136v1
- Date: Fri, 2 Apr 2021 16:29:57 GMT
- Title: LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
- Authors: Ben Graham and Alaaeldin El-Nouby and Hugo Touvron and Pierre Stock
and Armand Joulin and Herv\'e J\'egou and Matthijs Douze
- Abstract summary: We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime.
We introduce the attention bias, a new way to integrate positional information in vision transformers.
Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff.
- Score: 25.63398340113755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design a family of image classification architectures that optimize the
trade-off between accuracy and efficiency in a high-speed regime. Our work
exploits recent findings in attention-based architectures, which are
competitive on highly parallel processing hardware. We re-evaluated principles
from the extensive literature on convolutional neural networks to apply them to
transformers, in particular activation maps with decreasing resolutions. We
also introduce the attention bias, a new way to integrate positional
information in vision transformers. As a result, we propose LeVIT: a hybrid
neural network for fast inference image classification. We consider different
measures of efficiency on different hardware platforms, so as to best reflect a
wide range of application scenarios. Our extensive experiments empirically
validate our technical choices and show they are suitable to most
architectures. Overall, LeViT significantly outperforms existing convnets and
vision transformers with respect to the speed/accuracy tradeoff. For example,
at 80\% ImageNet top-1 accuracy, LeViT is 3.3 times faster than EfficientNet on
the CPU.
Related papers
- PriViT: Vision Transformers for Fast Private Inference [55.36478271911595]
Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications.
ViTs are ill-suited for private inference using secure multi-party protocols, due to the large number of non-polynomial operations.
We propose PriViT, an algorithm to selectively " Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy.
arXiv Detail & Related papers (2023-10-06T21:45:05Z) - FasterViT: Fast Vision Transformers with Hierarchical Attention [63.50580266223651]
We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications.
Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs.
arXiv Detail & Related papers (2023-06-09T18:41:37Z) - FastViT: A Fast Hybrid Vision Transformer using Structural
Reparameterization [14.707312504365376]
We introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off.
We show that our model is 3.5x faster than CMT, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2023-03-24T17:58:32Z) - Rethinking Vision Transformers for MobileNet Size and Speed [58.01406896628446]
We propose a novel supernet with low latency and high parameter efficiency.
We also introduce a novel fine-grained joint search strategy for transformer models.
This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed.
arXiv Detail & Related papers (2022-12-15T18:59:12Z) - Fast-ParC: Capturing Position Aware Global Feature for ConvNets and ViTs [35.39701561076837]
We propose a new basic neural network operator named position-aware circular convolution (ParC) and its accelerated version Fast-ParC.
Our Fast-ParC further reduces the O(n2) time complexity of ParC to O(n log n) using Fast Fourier Transform.
Experiment results show that our ParC op can effectively enlarge the receptive field of traditional ConvNets.
arXiv Detail & Related papers (2022-10-08T13:14:02Z) - EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
Transformers [88.52500757894119]
Self-attention based vision transformers (ViTs) have emerged as a very competitive architecture alternative to convolutional neural networks (CNNs) in computer vision.
We introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs.
arXiv Detail & Related papers (2022-05-06T18:17:19Z) - Three things everyone should know about Vision Transformers [67.30250766591405]
transformer architectures have rapidly gained traction in computer vision.
We offer three insights based on simple and easy to implement variants of vision transformers.
We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set.
arXiv Detail & Related papers (2022-03-18T08:23:03Z) - ConvNets vs. Transformers: Whose Visual Representations are More
Transferable? [49.62201738334348]
We investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations.
We observe consistent advantages of Transformer-based backbones on 13 downstream tasks.
arXiv Detail & Related papers (2021-08-11T16:20:38Z) - Rethinking the Design Principles of Robust Vision Transformer [28.538786330184642]
Vision Transformers (ViT) have shown that self-attention-based networks surpassed traditional convolution neural networks (CNNs) in most vision tasks.
In this paper, we rethink the design principles of ViTs based on the robustness.
By combining the robust design components, we propose Robust Vision Transformer (RVT)
arXiv Detail & Related papers (2021-05-17T15:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.