Efficient NLP Model Finetuning via Multistage Data Filtering
- URL: http://arxiv.org/abs/2207.14386v2
- Date: Fri, 19 May 2023 02:40:55 GMT
- Title: Efficient NLP Model Finetuning via Multistage Data Filtering
- Authors: Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji
- Abstract summary: We set to filter training examples in a streaming fashion, in tandem with training the target model.
Our key techniques are (1) automatically determine a training loss threshold for skipping backward training passes; (2) run a meta predictor for further skipping forward training passes.
Our method reduces the required training examples by up to 5.3$times$ and training time by up to 6.8$times$, while only seeing minor accuracy degradation.
- Score: 11.058786955754004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As model finetuning is central to the modern NLP, we set to maximize its
efficiency. Motivated by redundancy in training examples and the sheer sizes of
pretrained models, we exploit a key opportunity: training only on important
data. To this end, we set to filter training examples in a streaming fashion,
in tandem with training the target model. Our key techniques are two: (1)
automatically determine a training loss threshold for skipping backward
training passes; (2) run a meta predictor for further skipping forward training
passes. We integrate the above techniques in a holistic, three-stage training
process. On a diverse set of benchmarks, our method reduces the required
training examples by up to 5.3$\times$ and training time by up to 6.8$\times$,
while only seeing minor accuracy degradation. Our method is effective even when
training one epoch, where each training example is encountered only once. It is
simple to implement and is compatible with the existing finetuning techniques.
Code is available at: https://github.com/xo28/efficient-
NLP-multistage-training
Related papers
- Boosting Meta-Training with Base Class Information for Few-Shot Learning [35.144099160883606]
We propose an end-to-end training paradigm consisting of two alternative loops.
In the outer loop, we calculate cross entropy loss on the entire training set while updating only the final linear layer.
This training paradigm not only converges quickly but also outperforms existing baselines, indicating that information from the overall training set and the meta-learning training paradigm could mutually reinforce one another.
arXiv Detail & Related papers (2024-03-06T05:13:23Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - SwiftLearn: A Data-Efficient Training Method of Deep Learning Models
using Importance Sampling [3.8330834108666667]
We present SwiftLearn, a data-efficient approach to accelerate training of deep learning models.
This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages.
We show that almost 90% of the data can be dropped achieving an end-to-end average speedup of 3.36x while keeping the average accuracy drop less than 0.92%.
arXiv Detail & Related papers (2023-11-25T22:51:01Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics [7.58472343957521]
We show that training dynamics are highly transferable across model sizes and pre-training methods.
We propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT)
arXiv Detail & Related papers (2023-10-10T12:53:48Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - A Practical Incremental Method to Train Deep CTR Models [37.54660958085938]
We introduce a practical incremental method to train deep CTR models, which consists of three decoupled modules.
Our method can achieve comparable performance to the conventional batch mode training with much better training efficiency.
arXiv Detail & Related papers (2020-09-04T12:35:42Z) - A Novel DNN Training Framework via Data Sampling and Multi-Task
Optimization [7.001799696806368]
We propose a novel framework to train DNN models.
It generates multiple pairs of training and validation sets from the gross training set via random splitting.
It outputs the best, among all trained models, which has the overall best performance across the validation sets from all pairs.
arXiv Detail & Related papers (2020-07-02T10:58:57Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.