The Right Tool for the Job: Matching Model and Instance Complexities
- URL: http://arxiv.org/abs/2004.07453v2
- Date: Sat, 9 May 2020 03:45:10 GMT
- Title: The Right Tool for the Job: Matching Model and Instance Complexities
- Authors: Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge and
Noah A. Smith
- Abstract summary: As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
- Score: 62.95183777679024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As NLP models become larger, executing a trained model requires significant
computational resources incurring monetary and environmental costs. To better
respect a given inference budget, we propose a modification to contextual
representation fine-tuning which, during inference, allows for an early (and
fast) "exit" from neural network calculations for simple instances, and late
(and accurate) exit for hard instances. To achieve this, we add classifiers to
different layers of BERT and use their calibrated confidence scores to make
early exit decisions. We test our proposed modification on five different
datasets in two tasks: three text classification datasets and two natural
language inference benchmarks. Our method presents a favorable speed/accuracy
tradeoff in almost all cases, producing models which are up to five times
faster than the state of the art, while preserving their accuracy. Our method
also requires almost no additional training resources (in either time or
parameters) compared to the baseline BERT model. Finally, our method alleviates
the need for costly retraining of multiple models at different levels of
efficiency; we allow users to control the inference speed/accuracy tradeoff
using a single trained model, by setting a single variable at inference time.
We publicly release our code.
Related papers
- Self-calibration for Language Model Quantization and Pruning [38.00221764773372]
Quantization and pruning are fundamental approaches for model compression.
In a post-training setting, state-of-the-art quantization and pruning methods require calibration data.
We propose self-calibration as a solution.
arXiv Detail & Related papers (2024-10-22T16:50:00Z) - NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval [0.7646713951724011]
Existing approaches either fine-tune the pre-trained model itself or, more efficiently, train adaptor models to transform the output of the pre-trained model.
We present NUDGE, a family of novel non-parametric embedding fine-tuning approaches.
NUDGE directly modifies the embeddings of data records to maximize the accuracy of $k$-NN retrieval.
arXiv Detail & Related papers (2024-09-04T00:10:36Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings [6.463202903076821]
We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited.
Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach.
We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
arXiv Detail & Related papers (2023-06-04T09:16:39Z) - Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data.
Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.