Spatial Re-parameterization for N:M Sparsity
- URL: http://arxiv.org/abs/2306.05612v2
- Date: Thu, 14 Nov 2024 15:22:27 GMT
- Title: Spatial Re-parameterization for N:M Sparsity
- Authors: Yuxin Zhang, Mingliang Xu, Yonghong Tian, Rongrong Ji,
- Abstract summary: N:M sparsity exhibits a fixed sparsity rate within the spatial domains.
unstructured sparsity displays a substantial divergence in sparsity across the spatial domains.
SpRe has achieved a commendable feat by matching the performance of N:M sparsity methods with state-of-the-art unstructured sparsity methods.
- Score: 92.72334929464013
- License:
- Abstract: This paper presents a Spatial Re-parameterization (SpRe) method for the N:M sparsity in CNNs. SpRe is stemmed from an observation regarding the restricted variety in spatial sparsity present in N:M sparsity compared with unstructured sparsity. Particularly, N:M sparsity exhibits a fixed sparsity rate within the spatial domains due to its distinctive pattern that mandates N non-zero components among M successive weights in the input channel dimension of convolution filters. On the contrary, we observe that unstructured sparsity displays a substantial divergence in sparsity across the spatial domains, which we experimentally verified to be very crucial for its robust performance retention compared with N:M sparsity. Therefore, SpRe employs the spatial-sparsity distribution of unstructured sparsity to assign an extra branch in conjunction with the original N:M branch at training time, which allows the N:M sparse network to sustain a similar distribution of spatial sparsity with unstructured sparsity. During inference, the extra branch can be further re-parameterized into the main N:M branch, without exerting any distortion on the sparse pattern or additional computation costs. SpRe has achieved a commendable feat by matching the performance of N:M sparsity methods with state-of-the-art unstructured sparsity methods across various benchmarks. Code and models are anonymously available at \url{https://github.com/zyxxmu/SpRe}.
Related papers
- Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs [1.3124513975412255]
N:M sparsity pruning is a powerful technique for compressing deep neural networks.
We introduce a channel permutation method designed specifically for HiNM sparsity, named gyro-permutation.
arXiv Detail & Related papers (2024-07-30T01:40:50Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z) - Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask [8.02992650002693]
We study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost.
We propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay"
Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity.
arXiv Detail & Related papers (2022-09-15T21:30:55Z) - Training Structured Neural Networks Through Manifold Identification and
Variance Reduction [8.528384027684194]
This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures.
RMDA does not computation additional to incur momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form.
arXiv Detail & Related papers (2021-12-05T16:23:53Z) - Deep Stable neural networks: large-width asymptotics and convergence
rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP.
Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z) - Learning Generative Prior with Latent Space Sparsity Constraints [25.213673771175692]
It has been argued that the distribution of natural images do not lie in a single manifold but rather lie in a union of several submanifolds.
We propose a sparsity-driven latent space sampling (SDLSS) framework and develop a proximal meta-learning (PML) algorithm to enforce sparsity in the latent space.
The results demonstrate that for a higher degree of compression, the SDLSS method is more efficient than the state-of-the-art method.
arXiv Detail & Related papers (2021-05-25T14:12:04Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.