Emerging Paradigms of Neural Network Pruning
- URL: http://arxiv.org/abs/2103.06460v1
- Date: Thu, 11 Mar 2021 05:01:52 GMT
- Title: Emerging Paradigms of Neural Network Pruning
- Authors: Huan Wang, Can Qin, Yulun Zhang, Yun Fu
- Abstract summary: Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised.
Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart.
This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
- Score: 82.9322109208353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over-parameterization of neural networks benefits the optimization and
generalization yet brings cost in practice. Pruning is adopted as a
post-processing solution to this problem, which aims to remove unnecessary
parameters in a neural network with little performance compromised. It has been
broadly believed the resulted sparse neural network cannot be trained from
scratch to comparable accuracy. However, several recent works (e.g., [Frankle
and Carbin, 2019a]) challenge this belief by discovering random sparse networks
which can be trained to match the performance with their dense counterpart.
This new pruning paradigm later inspires more new methods of pruning at
initialization. In spite of the encouraging progress, how to coordinate these
new pruning fashions with the traditional pruning has not been explored yet.
This survey seeks to bridge the gap by proposing a general pruning framework so
that the emerging pruning paradigms can be accommodated well with the
traditional one. With it, we systematically reflect the major differences and
new insights brought by these new pruning fashions, with representative works
discussed at length. Finally, we summarize the open questions as worthy future
directions.
Related papers
- Convergence Guarantees of Overparametrized Wide Deep Inverse Prior [1.5362025549031046]
Inverse Priors is an unsupervised approach to transform a random input into an object whose image under the forward model matches the observation.
We provide overparametrization bounds under which such network trained via continuous-time gradient descent will converge exponentially fast with high probability.
This work is thus a first step towards a theoretical understanding of overparametrized DIP networks, and more broadly it participates to the theoretical understanding of neural networks in inverse problem settings.
arXiv Detail & Related papers (2023-03-20T16:49:40Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks.
Our proposed methodology can prune neural networks to respect pre-defined computational budgets.
We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - All You Need is a Good Functional Prior for Bayesian Deep Learning [15.10662960548448]
We argue that this is a hugely limiting aspect of Bayesian deep learning.
We propose a novel and robust framework to match their prior with the functional prior of neural networks.
We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements.
arXiv Detail & Related papers (2020-11-25T15:36:16Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z) - A Framework for Neural Network Pruning Using Gibbs Distributions [34.0576955010317]
Gibbs pruning is a novel framework for expressing and designing neural network pruning methods.
It can train and prune a network simultaneously in such a way that the learned weights and pruning mask are well-adapted for each other.
We achieve a new state-of-the-art result for pruning ResNet-56 with the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-06-08T23:04:53Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.