Enhanced Sparsification via Stimulative Training
- URL: http://arxiv.org/abs/2403.06417v1
- Date: Mon, 11 Mar 2024 04:05:17 GMT
- Title: Enhanced Sparsification via Stimulative Training
- Authors: Shengji Tang, Weihao Lin, Hancheng Ye, Peng Ye, Chong Yu, Baopu Li,
Tao Chen
- Abstract summary: Existing methods commonly set sparsity-inducing penalty terms to suppress the importance of dropped weights.
We propose a structured pruning framework, named expressivity, based on an enhanced sparsification paradigm.
To reduce the huge capacity gap of distillation, we propose a mutating expansion technique.
- Score: 36.0559905521154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparsification-based pruning has been an important category in model
compression. Existing methods commonly set sparsity-inducing penalty terms to
suppress the importance of dropped weights, which is regarded as the suppressed
sparsification paradigm. However, this paradigm inactivates the dropped parts
of networks causing capacity damage before pruning, thereby leading to
performance degradation. To alleviate this issue, we first study and reveal the
relative sparsity effect in emerging stimulative training and then propose a
structured pruning framework, named STP, based on an enhanced sparsification
paradigm which maintains the magnitude of dropped weights and enhances the
expressivity of kept weights by self-distillation. Besides, to find an optimal
architecture for the pruned network, we propose a multi-dimension architecture
space and a knowledge distillation-guided exploration strategy. To reduce the
huge capacity gap of distillation, we propose a subnet mutating expansion
technique. Extensive experiments on various benchmarks indicate the
effectiveness of STP. Specifically, without fine-tuning, our method
consistently achieves superior performance at different budgets, especially
under extremely aggressive pruning scenarios, e.g., remaining 95.11% Top-1
accuracy (72.43% in 76.15%) while reducing 85% FLOPs for ResNet-50 on ImageNet.
Codes will be released soon.
Related papers
- UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z) - Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks.
Pruning structures from these models is a straightforward approach to reducing network complexity.
Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates.
This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning)
GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models.
Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z) - Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z) - Attentive Fine-Grained Structured Sparsity for Image Restoration [63.35887911506264]
N:M structured pruning has appeared as one of the effective and practical pruning approaches for making the model efficient with the accuracy constraint.
We propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer.
arXiv Detail & Related papers (2022-04-26T12:44:55Z) - Sparse Progressive Distillation: Resolving Overfitting under
Pretrain-and-Finetune Paradigm [7.662952656290564]
Various pruning approaches have been proposed to reduce the footprint requirements of Transformer-based language models.
We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm.
arXiv Detail & Related papers (2021-10-15T16:42:56Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Toward Compact Deep Neural Networks via Energy-Aware Pruning [2.578242050187029]
We propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN)
We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10.
arXiv Detail & Related papers (2021-03-19T15:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.