SwiftLearn: A Data-Efficient Training Method of Deep Learning Models
using Importance Sampling
- URL: http://arxiv.org/abs/2311.15134v1
- Date: Sat, 25 Nov 2023 22:51:01 GMT
- Title: SwiftLearn: A Data-Efficient Training Method of Deep Learning Models
using Importance Sampling
- Authors: Habib Hajimolahoseini, Omar Mohamed Awad, Walid Ahmed, Austin Wen,
Saina Asani, Mohammad Hassanpour, Farnoosh Javadi, Mehdi Ahmadi, Foozhan
Ataiefard, Kangling Liu, Yang Liu
- Abstract summary: We present SwiftLearn, a data-efficient approach to accelerate training of deep learning models.
This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages.
We show that almost 90% of the data can be dropped achieving an end-to-end average speedup of 3.36x while keeping the average accuracy drop less than 0.92%.
- Score: 3.8330834108666667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present SwiftLearn, a data-efficient approach to accelerate
training of deep learning models using a subset of data samples selected during
the warm-up stages of training. This subset is selected based on an importance
criteria measured over the entire dataset during warm-up stages, aiming to
preserve the model performance with fewer examples during the rest of training.
The importance measure we propose could be updated during training every once
in a while, to make sure that all of the data samples have a chance to return
to the training loop if they show a higher importance. The model architecture
is unchanged but since the number of data samples controls the number of
forward and backward passes during training, we can reduce the training time by
reducing the number of training samples used in each epoch of training.
Experimental results on a variety of CV and NLP models during both pretraining
and finetuning show that the model performance could be preserved while
achieving a significant speed-up during training. More specifically, BERT
finetuning on GLUE benchmark shows that almost 90% of the data can be dropped
achieving an end-to-end average speedup of 3.36x while keeping the average
accuracy drop less than 0.92%.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - D4: Improving LLM Pretraining via Document De-Duplication and
Diversification [38.84592304799403]
We show that careful data selection via pre-trained model embeddings can speed up training.
We also show that repeating data intelligently consistently outperforms baseline training.
arXiv Detail & Related papers (2023-08-23T17:58:14Z) - On minimizing the training set fill distance in machine learning regression [0.552480439325792]
We study a data selection approach that aims to minimize the fill distance of the selected set.
We show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
arXiv Detail & Related papers (2023-07-20T16:18:33Z) - NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification
Tasks [0.0]
Finetuning large language models inflates the costs of NLU applications.
Recent works in computer vision use data pruning to reduce training time.
We propose a curriculum which periodically scores and discards unimportant examples during finetuning.
arXiv Detail & Related papers (2023-06-05T19:30:41Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Dataset Pruning: Reducing Training Data by Examining Generalization
Influence [30.30255670341501]
Do all training data contribute to model's performance?
How to construct a smallest subset from the entire training data as a proxy training set without significantly sacrificing the model's performance?
arXiv Detail & Related papers (2022-05-19T05:36:35Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.