Accelerating DNN Training with Structured Data Gradient Pruning
- URL: http://arxiv.org/abs/2202.00774v1
- Date: Tue, 1 Feb 2022 21:41:51 GMT
- Title: Accelerating DNN Training with Structured Data Gradient Pruning
- Authors: Bradley McDanel, Helia Dinh, John Magallanes
- Abstract summary: Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient.
Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction.
Our approach can achieve a 15-25% reduction in total training time without significant impact to performance.
- Score: 0.5801044612920815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weight pruning is a technique to make Deep Neural Network (DNN) inference
more computationally efficient by reducing the number of model parameters over
the course of training. However, most weight pruning techniques generally does
not speed up DNN training and can even require more iterations to reach model
convergence. In this work, we propose a novel Structured Data Gradient Pruning
(SDGP) method that can speed up training without impacting model convergence.
This approach enforces a specific sparsity structure, where only N out of every
M elements in a matrix can be nonzero, making it amenable to hardware
acceleration. Modern accelerators such as the Nvidia A100 GPU support this type
of structured sparsity for 2 nonzeros per 4 elements in a reduction. Assuming
hardware support for 2:4 sparsity, our approach can achieve a 15-25\% reduction
in total training time without significant impact to performance. Source code
and pre-trained models are available at
\url{https://github.com/BradMcDanel/sdgp}.
Related papers
- Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and
Dataflow Co-Design [15.47240906902083]
This paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design.
At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights.
At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to support both the regular dense operations and the computation-efficient N:M sparse operations.
arXiv Detail & Related papers (2023-09-22T17:26:19Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach.
Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity.
Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Speedup deep learning models on GPU by taking advantage of efficient
unstructured pruning and bit-width reduction [0.0]
This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units ( GPU)
The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs.
arXiv Detail & Related papers (2021-12-28T19:36:41Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - When deep learning models on GPU can be accelerated by taking advantage
of unstructured sparsity [0.0]
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units ( GPU)
The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution.
We show when is worth using a direct sparse operation to speed-up the calculation of the convolution layers.
arXiv Detail & Related papers (2020-11-12T10:13:48Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF)
It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization.
The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.