CPT: Efficient Deep Neural Network Training via Cyclic Precision
- URL: http://arxiv.org/abs/2101.09868v1
- Date: Mon, 25 Jan 2021 02:56:18 GMT
- Title: CPT: Efficient Deep Neural Network Training via Cyclic Precision
- Authors: Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra,
Yingyan Lin
- Abstract summary: Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency.
We conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training.
- Score: 19.677029887330036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-precision deep neural network (DNN) training has gained tremendous
attention as reducing precision is one of the most effective knobs for boosting
DNNs' training time/energy efficiency. In this paper, we attempt to explore
low-precision training from a new perspective as inspired by recent findings in
understanding DNN training: we conjecture that DNNs' precision might have a
similar effect as the learning rate during DNN training, and advocate dynamic
precision along the training trajectory for further boosting the time/energy
efficiency of DNN training. Specifically, we propose Cyclic Precision Training
(CPT) to cyclically vary the precision between two boundary values which can be
identified using a simple precision range test within the first few training
epochs. Extensive simulations and ablation studies on five datasets and ten
models demonstrate that CPT's effectiveness is consistent across various
models/tasks (including classification and language modeling). Furthermore,
through experiments and visualization we show that CPT helps to (1) converge to
a wider minima with a lower generalization error and (2) reduce training
variance which we believe opens up a new design knob for simultaneously
improving the optimization and efficiency of DNN training. Our codes are
available at: https://github.com/RICE-EIC/CPT.
Related papers
- CycleBNN: Cyclic Precision Training in Binary Neural Networks [13.756549063691624]
This paper works on Binary Neural Networks (BNNs)
BNNs offer significant reductions in computational overhead and memory footprint to full precision networks.
However, the challenge of energy-intensive training and the drop in performance have been persistent issues.
Unlike prior works, this study offers an innovative methodology integrating BNNs with cyclic precision training, introducing the CycleBNN.
arXiv Detail & Related papers (2024-09-28T08:51:25Z) - Better Schedules for Low Precision Training of Deep Neural Networks [13.88763215392452]
cyclic precision training (CPT) dynamically adjusts precision throughout training according to a cyclic schedule.
CPT achieves particularly impressive improvements in training efficiency, while actually improving DNN performance.
arXiv Detail & Related papers (2024-03-04T17:33:39Z) - Enhancing Deep Neural Network Training Efficiency and Performance through Linear Prediction [0.0]
Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing.
This paper aims to propose a method to optimize the training effectiveness of DNN, with the goal of improving model performance.
arXiv Detail & Related papers (2023-10-17T03:11:30Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - LDP: Learnable Dynamic Precision for Efficient Deep Neural Network
Training and Inference [24.431074439663437]
Learnable Dynamic Precision (LDP) is a framework that automatically learns a temporally and spatially dynamic precision schedule during training.
LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs.
arXiv Detail & Related papers (2022-03-15T08:01:46Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.