FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
- URL: http://arxiv.org/abs/2012.13113v2
- Date: Sat, 04 Jan 2025 03:47:41 GMT
- Title: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
- Authors: Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Celine Lin,
- Abstract summary: We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
- Score: 62.932299614630985
- License:
- Abstract: Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.
Related papers
- SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Block-Wise Dynamic-Precision Neural Network Training Acceleration via
Online Quantization Sensitivity Analytics [8.373265629267257]
We propose DYNASTY, a block-wise dynamic-precision neural network training framework.
DYNASTY provides accurate data sensitivity information through fast online analytics, and maintains stable training convergence with an adaptive bit-width map generator.
Compared to 8-bit quantization baseline, DYNASTY brings up to $5.1times$ speedup and $4.7times$ energy consumption reduction with no accuracy drop and negligible hardware overhead.
arXiv Detail & Related papers (2022-10-31T03:54:16Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - LDP: Learnable Dynamic Precision for Efficient Deep Neural Network
Training and Inference [24.431074439663437]
Learnable Dynamic Precision (LDP) is a framework that automatically learns a temporally and spatially dynamic precision schedule during training.
LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs.
arXiv Detail & Related papers (2022-03-15T08:01:46Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - CPT: Efficient Deep Neural Network Training via Cyclic Precision [24.218905131408288]
We conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training.
We propose Cyclic Precision Training (CPT) to vary the precision between two boundary values.
arXiv Detail & Related papers (2021-01-25T02:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.