When deep learning models on GPU can be accelerated by taking advantage
of unstructured sparsity
- URL: http://arxiv.org/abs/2011.06295v2
- Date: Sat, 17 Apr 2021 11:26:46 GMT
- Title: When deep learning models on GPU can be accelerated by taking advantage
of unstructured sparsity
- Authors: Marcin Pietro\'n, Dominik \.Zurek
- Abstract summary: This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units ( GPU)
The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution.
We show when is worth using a direct sparse operation to speed-up the calculation of the convolution layers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is focused on the improvement the efficiency of the sparse
convolutional neural networks (CNNs) layers on graphic processing units (GPU).
The Nvidia deep neural network (cuDnn) library provides the most effective
implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the
most efficient and commonly used accelerators for deep learning computations.
The modern CNN models need megabytes of coefficients and needed millions MAC
operations to perform convolution. One of the most common techniques for
compressing CNN models is weight pruning. There are two main types of pruning:
structural (based on removing whole weight channels) and non-structural
(removing individual weights). The first enables much easier acceleration, but
with this type it is difficult to achieve a sparsity level and accuracy as high
as that obtained with the second type. Non-structural pruning with retraining
can generate a matrix-weight up to $\sim90\%$ or more of sparsity in some deep
CNN models. This work shows when is worth using a direct sparse operation to
speed-up the calculation of the convolution layers. The VGG-16, CNN-non-static
and 1x1 layers from ResNet models were used as a benchmarks. In addition, we
present the impact of using reduced precision on time efficiency.
Related papers
- Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices.
Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware.
We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z) - Accelerating DNN Training with Structured Data Gradient Pruning [0.5801044612920815]
Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient.
Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction.
Our approach can achieve a 15-25% reduction in total training time without significant impact to performance.
arXiv Detail & Related papers (2022-02-01T21:41:51Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - Speedup deep learning models on GPU by taking advantage of efficient
unstructured pruning and bit-width reduction [0.0]
This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units ( GPU)
The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs.
arXiv Detail & Related papers (2021-12-28T19:36:41Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - L2PF -- Learning to Prune Faster [57.32153461504626]
We present a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned.
For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation.
Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.
arXiv Detail & Related papers (2021-01-07T18:13:37Z) - SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
Neural Networks Training [34.657942518465575]
Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources.
In this paper, textitSparseTrain is proposed to accelerate CNN training by fully exploiting the sparsity.
We have built %a simple compiler to map CNNs onto textitSparseTrain, and a cycle-accurate architecture simulator to evaluate the performance and efficiency.
arXiv Detail & Related papers (2020-07-21T11:01:36Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Performance Aware Convolutional Neural Network Channel Pruning for
Embedded GPUs [6.035819238203187]
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance.
We also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM.
arXiv Detail & Related papers (2020-02-20T12:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.