Related papers: Dynamic Transformers Provide a False Sense of Efficiency

Dynamic Transformers Provide a False Sense of Efficiency

URL: http://arxiv.org/abs/2305.12228v1
Date: Sat, 20 May 2023 16:41:48 GMT
Title: Dynamic Transformers Provide a False Sense of Efficiency
Authors: Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li
Abstract summary: Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit. We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models. Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
Score: 75.39702559746533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite much success in natural language processing (NLP), pre-trained language models typically lead to a high computational cost during inference. Multi-exit is a mainstream approach to address this issue by making a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit. However, whether such saving from early-exiting is robust remains unknown. Motivated by this, we first show that directly adapting existing adversarial attack approaches targeting model accuracy cannot significantly reduce inference efficiency. To this end, we propose a simple yet effective attacking framework, SAME, a novel slowdown attack framework on multi-exit models, which is specially tailored to reduce the efficiency of the multi-exit models. By leveraging the multi-exit models' design characteristics, we utilize all internal predictions to guide the adversarial sample generation instead of merely considering the final prediction. Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average, convincingly validating its effectiveness and generalization ability.

Related papers

Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns. We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z)
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks [43.967626080432275]
We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT) We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information. Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-03T15:51:17Z)
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches. We present UPET, a novel Uncertainty-aware self-Training framework. We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z)
A Model Stealing Attack Against Multi-Exit Networks [13.971211573064739]
We propose the first model stealing attack against multi-exit networks to extract both the model utility and the output strategy. In experiments across multiple multi-exit networks and benchmark datasets, our method always achieves accuracy and efficiency closest to the victim models.
arXiv Detail & Related papers (2023-05-23T01:24:39Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff. cutoff relies on sampling consistency and thus adds little computational overhead. cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.