Dynamic Transformers Provide a False Sense of Efficiency
- URL: http://arxiv.org/abs/2305.12228v1
- Date: Sat, 20 May 2023 16:41:48 GMT
- Title: Dynamic Transformers Provide a False Sense of Efficiency
- Authors: Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan,
Haizhou Li
- Abstract summary: Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit.
We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models.
Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
- Score: 75.39702559746533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite much success in natural language processing (NLP), pre-trained
language models typically lead to a high computational cost during inference.
Multi-exit is a mainstream approach to address this issue by making a trade-off
between efficiency and accuracy, where the saving of computation comes from an
early exit. However, whether such saving from early-exiting is robust remains
unknown. Motivated by this, we first show that directly adapting existing
adversarial attack approaches targeting model accuracy cannot significantly
reduce inference efficiency. To this end, we propose a simple yet effective
attacking framework, SAME, a novel slowdown attack framework on multi-exit
models, which is specially tailored to reduce the efficiency of the multi-exit
models. By leveraging the multi-exit models' design characteristics, we utilize
all internal predictions to guide the adversarial sample generation instead of
merely considering the final prediction. Experiments on the GLUE benchmark show
that SAME can effectively diminish the efficiency gain of various multi-exit
models by 80% on average, convincingly validating its effectiveness and
generalization ability.
Related papers
- Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on
Prototypical Networks [43.967626080432275]
We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT)
We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information.
Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-03T15:51:17Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - A Simple but Tough-to-Beat Data Augmentation Approach for Natural
Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff.
cutoff relies on sampling consistency and thus adds little computational overhead.
cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.