SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
Neural Networks Training
- URL: http://arxiv.org/abs/2007.13595v1
- Date: Tue, 21 Jul 2020 11:01:36 GMT
- Title: SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
Neural Networks Training
- Authors: Pengcheng Dai, Jianlei Yang, Xucheng Ye, Xingzhou Cheng, Junyu Luo,
Linghao Song, Yiran Chen, Weisheng Zhao
- Abstract summary: Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources.
In this paper, textitSparseTrain is proposed to accelerate CNN training by fully exploiting the sparsity.
We have built %a simple compiler to map CNNs onto textitSparseTrain, and a cycle-accurate architecture simulator to evaluate the performance and efficiency.
- Score: 34.657942518465575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training Convolutional Neural Networks (CNNs) usually requires a large number
of computational resources. In this paper, \textit{SparseTrain} is proposed to
accelerate CNN training by fully exploiting the sparsity. It mainly involves
three levels of innovations: activation gradients pruning algorithm, sparse
training dataflow, and accelerator architecture. By applying a stochastic
pruning algorithm on each layer, the sparsity of back-propagation gradients can
be increased dramatically without degrading training accuracy and convergence
rate. Moreover, to utilize both \textit{natural sparsity} (resulted from ReLU
or Pooling layers) and \textit{artificial sparsity} (brought by pruning
algorithm), a sparse-aware architecture is proposed for training acceleration.
This architecture supports forward and back-propagation of CNN by adopting
1-Dimensional convolution dataflow. We have built %a simple compiler to map
CNNs topology onto \textit{SparseTrain}, and a cycle-accurate architecture
simulator to evaluate the performance and efficiency based on the synthesized
design with $14nm$ FinFET technologies. Evaluation results on AlexNet/ResNet
show that \textit{SparseTrain} could achieve about $2.7 \times$ speedup and
$2.2 \times$ energy efficiency improvement on average compared with the
original training process.
Related papers
- (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing.
One of the key questions of structural pruning is how to estimate the channel significance.
We propose a novel algorithmic framework, namely textttPASS.
It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z) - SparseProp: Efficient Sparse Backpropagation for Faster Training of
Neural Networks [20.18957052535565]
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse.
Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types.
We show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch.
arXiv Detail & Related papers (2023-02-09T18:54:05Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators [12.223778147172107]
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs)
These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption.
We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
arXiv Detail & Related papers (2022-02-04T18:48:36Z) - Speedup deep learning models on GPU by taking advantage of efficient
unstructured pruning and bit-width reduction [0.0]
This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units ( GPU)
The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs.
arXiv Detail & Related papers (2021-12-28T19:36:41Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - When deep learning models on GPU can be accelerated by taking advantage
of unstructured sparsity [0.0]
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units ( GPU)
The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution.
We show when is worth using a direct sparse operation to speed-up the calculation of the convolution layers.
arXiv Detail & Related papers (2020-11-12T10:13:48Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.