Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training
- URL: http://arxiv.org/abs/2009.10976v1
- Date: Wed, 23 Sep 2020 07:39:55 GMT
- Title: Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training
- Authors: Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy
Lemieux, Mieszko Lis
- Abstract summary: We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
- Score: 0.5219568203653523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of DNN pruning has led to the development of energy-efficient
inference accelerators that support pruned models with sparse weight and
activation tensors. Because the memory layouts and dataflows in these
architectures are optimized for the access patterns during
$\mathit{inference}$, however, they do not efficiently support the emerging
sparse $\mathit{training}$ techniques.
In this paper, we demonstrate (a) that accelerating sparse training requires
a co-design approach where algorithms are adapted to suit the constraints of
hardware, and (b) that hardware for sparse DNN training must tackle constraints
that do not arise in inference accelerators. As proof of concept, we adapt a
sparse training algorithm to be amenable to hardware acceleration; we then
develop dataflow, data layout, and load-balancing techniques to accelerate it.
The resulting system is a sparse DNN training accelerator that produces
pruned models with the same accuracy as dense models without first training,
then pruning, and finally retraining, a dense model. Compared to training the
equivalent unpruned models using a state-of-the-art DNN accelerator without
sparse training support, Procrustes consumes up to 3.26$\times$ less energy and
offers up to 4$\times$ speedup across a range of models, while pruning weights
by an order of magnitude and maintaining unpruned accuracy.
Related papers
- Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition [6.1810913678161405]
Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs.
Structured sparse hardware support provides limited flexibility and requires extra model fine-tuning.
This paper proposes tensor approximation via structured decomposition (TASD) to bridge the gap between sparse DNN models and hardware.
arXiv Detail & Related papers (2024-03-12T06:25:47Z) - Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and
Dataflow Co-Design [15.47240906902083]
This paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design.
At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights.
At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to support both the regular dense operations and the computation-efficient N:M sparse operations.
arXiv Detail & Related papers (2023-09-22T17:26:19Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Accelerating DNN Training with Structured Data Gradient Pruning [0.5801044612920815]
Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient.
Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction.
Our approach can achieve a 15-25% reduction in total training time without significant impact to performance.
arXiv Detail & Related papers (2022-02-01T21:41:51Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities.
We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.