Consistent Accelerated Inference via Confident Adaptive Transformers
- URL: http://arxiv.org/abs/2104.08803v1
- Date: Sun, 18 Apr 2021 10:22:28 GMT
- Title: Consistent Accelerated Inference via Confident Adaptive Transformers
- Authors: Tal Schuster, Adam Fisch, Tommi Jaakkola, Regina Barzilay
- Abstract summary: We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers.
We simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence.
We demonstrate the effectiveness of this approach on four classification and regression tasks.
- Score: 29.034390810078172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a novel approach for confidently accelerating inference in the
large and expensive multilayer Transformers that are now ubiquitous in natural
language processing (NLP). Amortized or approximate computational methods
increase efficiency, but can come with unpredictable performance costs. In this
work, we present CATs -- Confident Adaptive Transformers -- in which we
simultaneously increase computational efficiency, while guaranteeing a
specifiable degree of consistency with the original model with high confidence.
Our method trains additional prediction heads on top of intermediate layers,
and dynamically decides when to stop allocating computational effort to each
input using a meta consistency classifier. To calibrate our early prediction
stopping rule, we formulate a unique extension of conformal prediction. We
demonstrate the effectiveness of this approach on four classification and
regression tasks.
Related papers
- Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Fourier Test-time Adaptation with Multi-level Consistency for Robust
Classification [10.291631977766672]
We propose a novel approach called Fourier Test-time Adaptation (FTTA) to integrate input and model tuning.
FTTA builds a reliable multi-level consistency measurement of paired inputs for achieving self-supervised of prediction.
It was extensively validated on three large classification datasets with different modalities and organs.
arXiv Detail & Related papers (2023-06-05T02:29:38Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Transformer Uncertainty Estimation with Hierarchical Stochastic
Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation.
This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids.
We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.