Better Schedules for Low Precision Training of Deep Neural Networks
- URL: http://arxiv.org/abs/2403.02243v1
- Date: Mon, 4 Mar 2024 17:33:39 GMT
- Title: Better Schedules for Low Precision Training of Deep Neural Networks
- Authors: Cameron R. Wolfe and Anastasios Kyrillidis
- Abstract summary: cyclic precision training (CPT) dynamically adjusts precision throughout training according to a cyclic schedule.
CPT achieves particularly impressive improvements in training efficiency, while actually improving DNN performance.
- Score: 13.88763215392452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low precision training can significantly reduce the computational overhead of
training deep neural networks (DNNs). Though many such techniques exist, cyclic
precision training (CPT), which dynamically adjusts precision throughout
training according to a cyclic schedule, achieves particularly impressive
improvements in training efficiency, while actually improving DNN performance.
Existing CPT implementations take common learning rate schedules (e.g.,
cyclical cosine schedules) and use them for low precision training without
adequate comparisons to alternative scheduling options. We define a diverse
suite of CPT schedules and analyze their performance across a variety of DNN
training regimes, some of which are unexplored in the low precision training
literature (e.g., node classification with graph neural networks). From these
experiments, we discover alternative CPT schedules that offer further
improvements in training efficiency and model performance, as well as derive a
set of best practices for choosing CPT schedules. Going further, we find that a
correlation exists between model performance and training cost, and that
changing the underlying CPT schedule can control the tradeoff between these two
variables. To explain the direct correlation between model performance and
training cost, we draw a connection between quantized training and critical
learning periods, suggesting that aggressive quantization is a form of learning
impairment that can permanently damage model performance.
Related papers
- Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - LDP: Learnable Dynamic Precision for Efficient Deep Neural Network
Training and Inference [24.431074439663437]
Learnable Dynamic Precision (LDP) is a framework that automatically learns a temporally and spatially dynamic precision schedule during training.
LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs.
arXiv Detail & Related papers (2022-03-15T08:01:46Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - Exploration and Exploitation: Two Ways to Improve Chinese Spelling
Correction Models [51.744357472072416]
We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances.
Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models.
arXiv Detail & Related papers (2021-05-31T09:17:33Z) - CPT: Efficient Deep Neural Network Training via Cyclic Precision [19.677029887330036]
Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency.
We conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training.
arXiv Detail & Related papers (2021-01-25T02:56:18Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Regularized Evolutionary Population-Based Training [11.624954122221562]
This paper presents an algorithm called Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions.
EPBT results in faster, more accurate learning on image classification benchmarks.
arXiv Detail & Related papers (2020-02-11T06:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.