Related papers: Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

URL: http://arxiv.org/abs/2112.15445v1
Date: Tue, 28 Dec 2021 19:36:41 GMT
Title: Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction
Authors: Marcin Pietro\'n, Dominik \.Zurek
Abstract summary: This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units ( GPU) The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs. GPUs are the most commonly used accelerators for deep learning computations. One of the most common techniques for improving the efficiency of CNN models is weight pruning and quantization. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a weight tensors up to 90% or more of sparsity in some deep CNN models. In this article the pruning algorithm is presented which makes it possible to achieve high sparsity levels without accuracy drop. In the next stage the linear and non-linear quantization is adapted for further time and footprint reduction. This paper is an extended of previously published paper concerning effective pruning techniques and present real models pruned with high sparsities and reduced precision which can achieve better performance than the CuDnn library.

Related papers

Efficient Deep Learning with Decorrelated Backpropagation [1.9731499060686393]
We show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. We obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network.
arXiv Detail & Related papers (2024-05-03T17:21:13Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
Accelerating DNN Training with Structured Data Gradient Pruning [0.5801044612920815]
Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient. Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction. Our approach can achieve a 15-25% reduction in total training time without significant impact to performance.
arXiv Detail & Related papers (2022-02-01T21:41:51Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity [0.0]
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units ( GPU) The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. We show when is worth using a direct sparse operation to speed-up the calculation of the convolution layers.
arXiv Detail & Related papers (2020-11-12T10:13:48Z)
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model. Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
Gradient Centralization: A New Optimization Technique for Deep Neural Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.