Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask
- URL: http://arxiv.org/abs/2209.07617v1
- Date: Thu, 15 Sep 2022 21:30:55 GMT
- Title: Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask
- Authors: Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani
Agrawal, Utku Evci, Tushar Krishna
- Abstract summary: We study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost.
We propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay"
Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity.
- Score: 8.02992650002693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparsity has become one of the promising methods to compress and accelerate
Deep Neural Networks (DNNs). Among different categories of sparsity, structured
sparsity has gained more attention due to its efficient execution on modern
accelerators. Particularly, N:M sparsity is attractive because there are
already hardware accelerator architectures that can leverage certain forms of
N:M structured sparsity to yield higher compute-efficiency. In this work, we
focus on N:M sparsity and extensively study and evaluate various training
recipes for N:M sparsity in terms of the trade-off between model accuracy and
compute cost (FLOPs). Building upon this study, we propose two new decay-based
pruning methods, namely "pruning mask decay" and "sparse structure decay". Our
evaluations indicate that these proposed methods consistently deliver
state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on
a Transformer-based model for a translation task. The increase in the accuracy
of the sparse model using the new training recipes comes at the cost of
marginal increase in the total training compute (FLOPs).
Related papers
- BMRS: Bayesian Model Reduction for Structured Pruning [9.508747319738847]
We propose a fully end-to-end Bayesian method of structured pruning.
BMRS offers a theoretically grounded approach to structured pruning of neural networks.
arXiv Detail & Related papers (2024-06-03T14:08:04Z) - Progressive Gradient Flow for Robust N:M Sparsity Training in
Transformers [15.27677493050638]
N:M structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency.
There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions.
However, performance of models trained using these approaches tends to decline when confronted with high-sparsity regions.
arXiv Detail & Related papers (2024-02-07T10:55:59Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Maestro: Uncovering Low-Rank Structures via Trainable Decomposition [15.254107731735553]
Deep Neural Networks (DNNs) have been a large driver for AI breakthroughs in recent years.
They have been getting increasingly large as they become more accurate and safe.
This means that their training becomes increasingly costly and time-consuming.
We propose Maestro, a framework for trainable low-rank layers.
arXiv Detail & Related papers (2023-08-28T23:08:15Z) - Spatial Re-parameterization for N:M Sparsity [92.72334929464013]
N:M sparsity exhibits a fixed sparsity rate within the spatial domains.
unstructured sparsity displays a substantial divergence in sparsity across the spatial domains.
SpRe has achieved a commendable feat by matching the performance of N:M sparsity methods with state-of-the-art unstructured sparsity methods.
arXiv Detail & Related papers (2023-06-09T01:11:50Z) - Training Structured Neural Networks Through Manifold Identification and
Variance Reduction [8.528384027684194]
This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures.
RMDA does not computation additional to incur momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form.
arXiv Detail & Related papers (2021-12-05T16:23:53Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.