FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training
- URL: http://arxiv.org/abs/2012.13113v1
- Date: Thu, 24 Dec 2020 05:24:10 GMT
- Title: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training
- Authors: Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash
Gopalakrishnan, Zhangyang Wang, Yingyan Lin
- Abstract summary: We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
- Score: 81.85361544720885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous
demand for intelligent edge devices featuring on-site learning, while the
practical realization of such systems remains a challenge due to the limited
resources available at the edge and the required massive training costs for
state-of-the-art (SOTA) DNNs. As reducing precision is one of the most
effective knobs for boosting training time/energy efficiency, there has been a
growing interest in low-precision DNN training. In this paper, we explore from
an orthogonal direction: how to fractionally squeeze out more training cost
savings from the most redundant bit level, progressively along the training
trajectory and dynamically per input. Specifically, we propose FracTrain that
integrates (i) progressive fractional quantization which gradually increases
the precision of activations, weights, and gradients that will not reach the
precision of SOTA static quantized DNN training until the final training stage,
and (ii) dynamic fractional quantization which assigns precisions to both the
activations and gradients of each layer in an input-adaptive manner, for only
"fractionally" updating layer parameters. Extensive simulations and ablation
studies (six models, four datasets, and three training settings including
standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain
in reducing computational cost and hardware-quantified energy/latency of DNN
training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For
example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and
53.5% computational cost and training latency savings, respectively, compared
with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy.
Our codes are available at: https://github.com/RICE-EIC/FracTrain.
Related papers
- SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Block-Wise Dynamic-Precision Neural Network Training Acceleration via
Online Quantization Sensitivity Analytics [8.373265629267257]
We propose DYNASTY, a block-wise dynamic-precision neural network training framework.
DYNASTY provides accurate data sensitivity information through fast online analytics, and maintains stable training convergence with an adaptive bit-width map generator.
Compared to 8-bit quantization baseline, DYNASTY brings up to $5.1times$ speedup and $4.7times$ energy consumption reduction with no accuracy drop and negligible hardware overhead.
arXiv Detail & Related papers (2022-10-31T03:54:16Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - LDP: Learnable Dynamic Precision for Efficient Deep Neural Network
Training and Inference [24.431074439663437]
Learnable Dynamic Precision (LDP) is a framework that automatically learns a temporally and spatially dynamic precision schedule during training.
LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs.
arXiv Detail & Related papers (2022-03-15T08:01:46Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - CPT: Efficient Deep Neural Network Training via Cyclic Precision [19.677029887330036]
Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency.
We conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training.
arXiv Detail & Related papers (2021-01-25T02:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.