KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
- URL: http://arxiv.org/abs/2310.10102v1
- Date: Mon, 16 Oct 2023 06:19:29 GMT
- Title: KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
- Authors: Truong Thao Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega,
Fran\c{c}ois Trahay, Mohamed Wahib
- Abstract summary: We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
- Score: 2.8804804517897935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a method for hiding the least-important samples during
the training of deep neural networks to increase efficiency, i.e., to reduce
the cost of training. Using information about the loss and prediction
confidence during training, we adaptively find samples to exclude in a given
epoch based on their contribution to the overall learning process, without
significantly degrading accuracy. We explore the converge properties when
accounting for the reduction in the number of SGD updates. Empirical results on
various large-scale datasets and models used directly in image classification
and segmentation show that while the with-replacement importance sampling
algorithm performs poorly on large datasets, our method can reduce total
training time by up to 22% impacting accuracy only by 0.4% compared to the
baseline. Code available at https://github.com/TruongThaoNguyen/kakurenbo
Related papers
- Minibatch training of neural network ensembles via trajectory sampling [0.0]
We show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner.
We illustrate this approach by training NNEs to classify images in the MNIST datasets.
arXiv Detail & Related papers (2023-06-23T11:12:33Z) - Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning [28.042568086423298]
Repeated Sampling of Random Subsets (RS2) is a powerful yet overlooked random sampling strategy.
We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet.
Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques.
arXiv Detail & Related papers (2023-05-28T20:38:13Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Data optimization for large batch distributed training of deep neural
networks [0.19336815376402716]
Current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale.
We propose a data optimization approach that utilize machine learning to implicitly smooth out the loss landscape resulting in fewer local minima.
Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.
arXiv Detail & Related papers (2020-12-16T21:22:02Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Passive Batch Injection Training Technique: Boosting Network Performance
by Injecting Mini-Batches from a different Data Distribution [39.8046809855363]
This work presents a novel training technique for deep neural networks that makes use of additional data from a distribution that is different from that of the original input data.
To the best of our knowledge, this is the first work that makes use of different data distribution to aid the training of convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-06-08T08:17:32Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.