Efficient DNN Training with Knowledge-Guided Layer Freezing
- URL: http://arxiv.org/abs/2201.06227v1
- Date: Mon, 17 Jan 2022 06:08:49 GMT
- Title: Efficient DNN Training with Knowledge-Guided Layer Freezing
- Authors: Yiding Wang, Decang Sun, Kai Chen, Fan Lai, Mosharaf Chowdhury
- Abstract summary: Training deep neural networks (DNNs) is time-consuming.
This paper goes one step further by skipping computing and communication through DNN layer freezing.
KGT achieves 19%-43% training speedup w.r.t. the state-of-the-art without sacrificing accuracy.
- Score: 9.934418641613105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training deep neural networks (DNNs) is time-consuming. While most existing
solutions try to overlap/schedule computation and communication for efficient
training, this paper goes one step further by skipping computing and
communication through DNN layer freezing. Our key insight is that the training
progress of internal DNN layers differs significantly, and front layers often
become well-trained much earlier than deep layers. To explore this, we first
introduce the notion of training plasticity to quantify the training progress
of internal DNN layers. Then we design KGT, a knowledge-guided DNN training
system that employs semantic knowledge from a reference model to accurately
evaluate individual layers' training plasticity and safely freeze the converged
ones, saving their corresponding backward computation and communication. Our
reference model is generated on the fly using quantization techniques and runs
forward operations asynchronously on available CPUs to minimize the overhead.
In addition, KGT caches the intermediate outputs of the frozen layers with
prefetching to further skip the forward computation. Our implementation and
testbed experiments with popular vision and language models show that KGT
achieves 19%-43% training speedup w.r.t. the state-of-the-art without
sacrificing accuracy.
Related papers
- Comparison between layer-to-layer network training and conventional
network training using Deep Convolutional Neural Networks [0.6853165736531939]
Convolutional neural networks (CNNs) are widely used in various applications due to their effectiveness in extracting features from data.
We propose a layer-to-layer training method and compare its performance with the conventional training method.
Our experiments show that the layer-to-layer training method outperforms the conventional training method for both models.
arXiv Detail & Related papers (2023-03-27T14:29:18Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Going Deeper With Directly-Trained Larger Spiking Neural Networks [20.40894876501739]
Spiking neural networks (SNNs) are promising in coding for bio-usible information and event-driven signal processing.
However, the unique working mode of SNNs makes them more difficult to train than traditional networks.
We propose a CIF-dependent batch normalization (tpladBN) method based on the emerging-temporal backproation threshold.
arXiv Detail & Related papers (2020-10-29T07:15:52Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.