Data optimization for large batch distributed training of deep neural
networks
- URL: http://arxiv.org/abs/2012.09272v2
- Date: Fri, 18 Dec 2020 17:33:02 GMT
- Title: Data optimization for large batch distributed training of deep neural
networks
- Authors: Shubhankar Gahlot, Junqi Yin, Mallikarjun Shankar
- Abstract summary: Current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale.
We propose a data optimization approach that utilize machine learning to implicitly smooth out the loss landscape resulting in fewer local minima.
Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.
- Score: 0.19336815376402716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed training in deep learning (DL) is common practice as data and
models grow. The current practice for distributed training of deep neural
networks faces the challenges of communication bottlenecks when operating at
scale, and model accuracy deterioration with an increase in global batch size.
Present solutions focus on improving message exchange efficiency as well as
implementing techniques to tweak batch sizes and models in the training
process. The loss of training accuracy typically happens because the loss
function gets trapped in a local minima. We observe that the loss landscape
minimization is shaped by both the model and training data and propose a data
optimization approach that utilizes machine learning to implicitly smooth out
the loss landscape resulting in fewer local minima. Our approach filters out
data points which are less important to feature learning, enabling us to speed
up the training of models on larger batch sizes to improved accuracy.
Related papers
- KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Minibatch training of neural network ensembles via trajectory sampling [0.0]
We show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner.
We illustrate this approach by training NNEs to classify images in the MNIST datasets.
arXiv Detail & Related papers (2023-06-23T11:12:33Z) - Adversarial training with informed data selection [53.19381941131439]
Adrial training is the most efficient solution to defend the network against these malicious attacks.
This work proposes a data selection strategy to be applied in the mini-batch training.
The simulation results show that a good compromise can be obtained regarding robustness and standard accuracy.
arXiv Detail & Related papers (2023-01-07T12:09:50Z) - Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data.
We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes.
This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - Low Precision Decentralized Distributed Training with Heterogeneous Data [5.43185002439223]
We show the convergence of low precision decentralized training that aims to reduce the computational complexity of training and inference.
Experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with heterogeneous data.
arXiv Detail & Related papers (2021-11-17T20:48:09Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.