DNN Training Acceleration via Exploring GPGPU Friendly Sparsity
- URL: http://arxiv.org/abs/2203.05705v1
- Date: Fri, 11 Mar 2022 01:32:03 GMT
- Title: DNN Training Acceleration via Exploring GPGPU Friendly Sparsity
- Authors: Zhuoran Song, Yihong Xu, Han Li, Naifeng Jing, Xiaoyao Liang, Li Jiang
- Abstract summary: We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
- Score: 16.406482603838157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The training phases of Deep neural network~(DNN) consumes enormous processing
time and energy. Compression techniques utilizing the sparsity of DNNs can
effectively accelerate the inference phase of DNNs. However, it is hardly used
in the training phase because the training phase involves dense
matrix-multiplication using General-Purpose Computation on Graphics Processors
(GPGPU), which endorse the regular and structural data layout. In this paper,
we first propose the Approximate Random Dropout that replaces the conventional
random dropout of neurons and synapses with a regular and online generated
row-based or tile-based dropout patterns to eliminate the unnecessary
computation and data access for the multilayer perceptron~(MLP) and long
short-term memory~(LSTM). We then develop a SGD-based Search Algorithm that
produces the distribution of row-based or tile-based dropout patterns to
compensate for the potential accuracy loss. Moreover, aiming at the convolution
neural network~(CNN) training acceleration, we first explore the importance and
sensitivity of input feature maps; and then propose the sensitivity-aware
dropout method to dynamically drop the input feature maps based on their
sensitivity so as to achieve greater forward and backward training acceleration
while reserving better NN accuracy. To facilitate DNN programming, we build a
DNN training computation framework that unifies the proposed techniques in the
software stack. As a result, the GPGPU only needs to support the basic operator
-- matrix multiplication and can achieve significant performance improvement
regardless of DNN model.
Related papers
- Rethinking Deep Learning: Propagating Information in Neural Networks without Backpropagation and Statistical Optimization [0.0]
This study discusses the information propagation capabilities and potential practical applications of NNs as neural system mimicking structures.
In this study, the NNs architecture comprises fully connected layers using step functions as activation functions, with 0-15 hidden layers, and no weight updates.
The accuracy is calculated by comparing the average output vectors of the training data for each label with the output vectors of the test data, based on vector similarity.
arXiv Detail & Related papers (2024-08-18T09:22:24Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid
Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems.
We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z) - Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural
Networks [5.986408771459261]
Biological spiking neural networks (SNNs) can temporally encode information in their outputs, whereas artificial neural networks (ANNs) conventionally do not.
Here we show that temporal coding such as rank coding (RC) inspired by SNNs can also be applied to conventional ANNs such as LSTMs.
RC-training also significantly reduces time-to-insight during inference, with a minimal decrease in accuracy.
We demonstrate these in two toy problems of sequence classification, and in a temporally-encoded MNIST dataset where our RC model achieves 99.19% accuracy after the first input time-step
arXiv Detail & Related papers (2021-10-06T15:51:38Z) - SpikeMS: Deep Spiking Neural Network for Motion Segmentation [7.491944503744111]
textitSpikeMS is the first deep encoder-decoder SNN architecture for the real-world large-scale problem of motion segmentation.
We show that textitSpikeMS is capable of textitincremental predictions, or predictions from smaller amounts of test data than it is trained on.
arXiv Detail & Related papers (2021-05-13T21:34:55Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training [2.5025363034899732]
We present a novel approach to add the training ability to a baseline DNN accelerator (inference only) by splitting the SGD algorithm into simple computational elements.
Based on this approach we propose TaxoNN, a light-weight accelerator for DNN training.
Our experimental results show that TaxoNN delivers, on average, 0.97% higher misclassification rate compared to a full-precision implementation.
arXiv Detail & Related papers (2020-10-11T09:04:19Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.