Related papers: ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

URL: http://arxiv.org/abs/2312.11882v2
Date: Sun, 7 Apr 2024 17:16:42 GMT
Title: ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Authors: Ziqian Zeng, Yihuai Hong, Hongliang Dai, Huiping Zhuang, Cen Chen,
Abstract summary: We propose ConsistentEE, an early exiting method consistent in training and inference. A policy network is added to decide whether an instance should exit or continue. We incorporate memorized layer into reward function design, which allows "easy" instances to focus more on acceleration.
Score: 21.24566458648584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Early Exiting is one of the most popular methods to achieve efficient inference. Current early exiting methods adopt the (weighted) sum of the cross entropy loss of all internal classifiers during training, imposing all these classifiers to predict all instances correctly. However, during inference, as long as one internal classifier predicts an instance correctly, it can accelerate without losing accuracy. Thus, there is a notable gap between training and inference. We propose ConsistentEE, an early exiting method that is consistent in training and inference. ConsistentEE formulates the early exiting process as a reinforcement learning problem. A policy network is added to decide whether an instance should exit or continue. The training objective of ConsistentEE only require each instance to be predicted correctly by one internal classifier. Additionally, we introduce the concept Memorize Layer to measure the hardness of an instance. We incorporate memorized layer into reward function design, which allows "easy" instances to focus more on acceleration while "hard" instances to focus more on accuracy. Experimental results show that our method outperforms other baselines on various natural language understanding and generation tasks.

Related papers

Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference! [14.979893207094221]
Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space. There exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference. We propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling.
arXiv Detail & Related papers (2023-05-08T05:32:22Z)
Learning to Weight Samples for Dynamic Early-exiting Networks [35.03752825893429]
Early exiting is an effective paradigm for improving the inference efficiency of deep networks. Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit. We show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency.
arXiv Detail & Related papers (2022-09-17T10:46:32Z)
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation [77.85086491395981]
Early exiting allows instances to exit at different layers according to the estimation of difficulty. We propose a Hash-based Early Exiting approach (HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer. Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time.
arXiv Detail & Related papers (2022-03-03T12:02:05Z)
What is Next when Sequential Prediction Meets Implicitly Hard Interaction? [12.093590031186034]
Hardness Aware Interaction Learning framework (HAIL) consists of two base sequential learning networks and mutual exclusivity distillation (MED) Our framework can be easily extended to more peer base networks.
arXiv Detail & Related papers (2022-02-14T11:15:28Z)
Early Exiting with Ensemble Internal Classifiers [57.80488632985445]
Early exiting has gained much attention in the NLP community. We propose a voting-based strategy that considers predictions of all the past internal classifiers to infer the correct label. Experimental results on various NLP tasks show that our proposed objective function and voting-based strategy can achieve better accuracy-speed trade-offs.
arXiv Detail & Related papers (2021-05-28T12:54:11Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.