Early Exiting with Ensemble Internal Classifiers
- URL: http://arxiv.org/abs/2105.13792v1
- Date: Fri, 28 May 2021 12:54:11 GMT
- Title: Early Exiting with Ensemble Internal Classifiers
- Authors: Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang,
Zhao Cao, Xuanjing Huang, Xipeng Qiu
- Abstract summary: Early exiting has gained much attention in the NLP community.
We propose a voting-based strategy that considers predictions of all the past internal classifiers to infer the correct label.
Experimental results on various NLP tasks show that our proposed objective function and voting-based strategy can achieve better accuracy-speed trade-offs.
- Score: 57.80488632985445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a simple technique to accelerate inference of large-scale pre-trained
models, early exiting has gained much attention in the NLP community. It allows
samples to exit early at internal classifiers without passing through the
entire model. Most existing work usually trains the internal classifiers
independently and employs an exiting strategy to decide whether or not to exit
based on the confidence of the current internal classifier. However, none of
these works takes full advantage of the fact that the internal classifiers are
trained to solve the same task therefore can be used to construct an ensemble.
In this paper, we show that a novel objective function for the training of the
ensemble internal classifiers can be naturally induced from the perspective of
ensemble learning and information theory. The proposed training objective
consists of two terms: one for accuracy and the other for the diversity of the
internal classifiers. In contrast, the objective used in prior work is exactly
the accuracy term of our training objective therefore only optimizes the
accuracy but not diversity. Further, we propose a simple voting-based strategy
that considers predictions of all the past internal classifiers to infer the
correct label and decide whether to exit. Experimental results on various NLP
tasks show that our proposed objective function and voting-based strategy can
achieve better accuracy-speed trade-offs.
Related papers
- A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference [21.24566458648584]
We propose ConsistentEE, an early exiting method consistent in training and inference.
A policy network is added to decide whether an instance should exit or continue.
We incorporate memorized layer into reward function design, which allows "easy" instances to focus more on acceleration.
arXiv Detail & Related papers (2023-12-19T06:16:13Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations.
Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years.
We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z) - Optimal Representations for Covariate Shift [18.136705088756138]
We introduce a simple variational objective whose optima are exactly the set of all representations on which risk minimizers are guaranteed to be robust.
Our objectives achieve state-of-the-art results on DomainBed, and give insights into the robustness of recent methods, such as CLIP.
arXiv Detail & Related papers (2021-12-31T21:02:24Z) - Out-of-Scope Intent Detection with Self-Supervision and Discriminative
Training [20.242645823965145]
Out-of-scope intent detection is of practical importance in task-oriented dialogue systems.
We propose a method to train an out-of-scope intent classifier in a fully end-to-end manner by simulating the test scenario in training.
We evaluate our method extensively on four benchmark dialogue datasets and observe significant improvements over state-of-the-art approaches.
arXiv Detail & Related papers (2021-06-16T08:17:18Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.