Related papers: Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

URL: http://arxiv.org/abs/2305.18424v1
Date: Sun, 28 May 2023 20:38:13 GMT
Title: Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
Authors: Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias, Nezihe Merve G\"urel, Theodoros Rekatsinas
Abstract summary: Repeated Sampling of Random Subsets (RS2) is a powerful yet overlooked random sampling strategy. We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet. Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques.
Score: 28.042568086423298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Methods for carefully selecting or generating a small set of training data to learn from, i.e., data pruning, coreset selection, and data distillation, have been shown to be effective in reducing the ever-increasing cost of training neural networks. Behind this success are rigorously designed strategies for identifying informative training examples out of large datasets. However, these strategies come with additional computational costs associated with subset selection or data distillation before training begins, and furthermore, many are shown to even under-perform random sampling in high data compression regimes. As such, many data pruning, coreset selection, or distillation methods may not reduce 'time-to-accuracy', which has become a critical efficiency measure of training deep neural networks over large datasets. In this work, we revisit a powerful yet overlooked random sampling strategy to address these challenges and introduce an approach called Repeated Sampling of Random Subsets (RSRS or RS2), where we randomly sample the subset of training data for each epoch of model training. We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet. Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques. For example, when training on ImageNet in the high-compression regime (using less than 10% of the dataset each epoch), RS2 yields accuracy improvements up to 29% compared to competing pruning methods while offering a runtime reduction of 7x. Beyond the above meta-study, we provide a convergence analysis for RS2 and discuss its generalization capability. The primary goal of our work is to establish RS2 as a competitive baseline for future data selection or distillation techniques aimed at efficient training.

Related papers

GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning [44.401418612374286]
We introduce a novel soft-pruning method, GDeR, designed to update the training during the process using trainable prototypes. GDeR achieves or surpasses the performance of the full dataset with 30%50% fewer training samples. It also outperforms state-of-the-art pruning methods in imbalanced training and noisy training scenarios.
arXiv Detail & Related papers (2024-10-17T16:56:01Z)
A Study in Dataset Pruning for Image Super-Resolution [9.512648704408095]
We introduce a novel approach that reduces a dataset to a core-set of training samples, selected based on their loss values. We achieve results comparable to or surpassing those obtained from training on the entire dataset.
arXiv Detail & Related papers (2024-03-25T18:16:34Z)
Online Importance Sampling for Stochastic Gradient Optimization [33.42221341526944]
We propose a practical algorithm that efficiently computes data importance on-the-fly during training. We also introduce a novel metric based on the derivative of the loss w.r.t. the network output, designed for mini-batch importance sampling.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS) Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z)
Soft Random Sampling: A Theoretical and Empirical Analysis [59.719035355483875]
Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data. It selects a uniformly speed at random with replacement from each data set in each epoch. It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
arXiv Detail & Related papers (2023-11-21T17:03:21Z)
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks. We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process. Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z)
Accelerating Deep Learning with Dynamic Data Pruning [0.0]
Deep learning has become prohibitively costly, requiring access to powerful computing systems to train state-of-the-art networks. Previous work, such as forget scores and GraNd/EL2N scores, identify important samples within a full dataset and pruning the remaining samples, thereby reducing the iterations per epoch. We propose two algorithms, based on reinforcement learning techniques, to dynamically prune samples and achieve even higher accuracy than the random dynamic method.
arXiv Detail & Related papers (2021-11-24T16:47:34Z)
Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.