Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings
- URL: http://arxiv.org/abs/2306.02307v1
- Date: Sun, 4 Jun 2023 09:16:39 GMT
- Title: Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings
- Authors: Daniel Rotem, Michael Hassid, Jonathan Mamou, Roy Schwartz
- Abstract summary: We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited.
Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach.
We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
- Score: 6.463202903076821
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adaptive inference is a simple method for reducing inference costs. The
method works by maintaining multiple classifiers of different capacities, and
allocating resources to each test instance according to its difficulty. In this
work, we compare the two main approaches for adaptive inference, Early-Exit and
Multi-Model, when training data is limited. First, we observe that for models
with the same architecture and size, individual Multi-Model classifiers
outperform their Early-Exit counterparts by an average of 2.3%. We show that
this gap is caused by Early-Exit classifiers sharing model parameters during
training, resulting in conflicting gradient updates of model weights. We find
that despite this gap, Early-Exit still provides a better speed-accuracy
trade-off due to the overhead of the Multi-Model approach. To address these
issues, we propose SWEET (Separating Weights in Early Exit Transformers), an
Early-Exit fine-tuning method that assigns each classifier its own set of
unique model weights, not updated by other classifiers. We compare SWEET's
speed-accuracy curve to standard Early-Exit and Multi-Model baselines and find
that it outperforms both methods at fast speeds while maintaining comparable
scores to Early-Exit at slow speeds. Moreover, SWEET individual classifiers
outperform Early-Exit ones by 1.1% on average. SWEET enjoys the benefits of
both methods, paving the way for further reduction of inference costs in NLP.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - You Only Need End-to-End Training for Long-Tailed Recognition [8.789819609485225]
Cross-entropy loss tends to produce highly correlated features on imbalanced data.
We propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET)
Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-12-11T11:44:09Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Bayesian Few-Shot Classification with One-vs-Each P\'olya-Gamma
Augmented Gaussian Processes [7.6146285961466]
Few-shot classification (FSC) is an important step on the path toward human-like machine learning.
We propose a novel combination of P'olya-Gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters.
We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.
arXiv Detail & Related papers (2020-07-20T19:10:41Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.