Multi-Exit Vision Transformer for Dynamic Inference
- URL: http://arxiv.org/abs/2106.15183v2
- Date: Wed, 30 Jun 2021 07:45:39 GMT
- Title: Multi-Exit Vision Transformer for Dynamic Inference
- Authors: Arian Bakhtiarnia, Qi Zhang and Alexandros Iosifidis
- Abstract summary: We propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones.
We show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.
- Score: 88.17413955380262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks can be converted to multi-exit architectures by
inserting early exit branches after some of their intermediate layers. This
allows their inference process to become dynamic, which is useful for time
critical IoT applications with stringent latency requirements, but with
time-variant communication and computation resources. In particular, in edge
computing systems and IoT networks where the exact computation time budget is
variable and not known beforehand. Vision Transformer is a recently proposed
architecture which has since found many applications across various domains of
computer vision. In this work, we propose seven different architectures for
early exit branches that can be used for dynamic inference in Vision
Transformer backbones. Through extensive experiments involving both
classification and regression problems, we show that each one of our proposed
architectures could prove useful in the trade-off between accuracy and speed.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems [0.0]
We decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers.
A key finding is that neural gating and attention improves the accuracy of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental.
arXiv Detail & Related papers (2024-10-03T16:41:51Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Transformers predicting the future. Applying attention in next-frame and
time series forecasting [0.0]
Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences.
With the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms without any RNN can improve on the results in various sequence processing tasks.
arXiv Detail & Related papers (2021-08-18T16:17:29Z) - Automated Search for Resource-Efficient Branched Multi-Task Networks [81.48051635183916]
We propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching structures in a multi-task neural network.
We show that our approach consistently finds high-performing branching structures within limited resource budgets.
arXiv Detail & Related papers (2020-08-24T09:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.