Related papers: Emerging Paradigms of Neural Network Pruning

Emerging Paradigms of Neural Network Pruning

URL: http://arxiv.org/abs/2103.06460v1
Date: Thu, 11 Mar 2021 05:01:52 GMT
Title: Emerging Paradigms of Neural Network Pruning
Authors: Huan Wang, Can Qin, Yulun Zhang, Yun Fu
Abstract summary: Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
Score: 82.9322109208353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over-parameterization of neural networks benefits the optimization and generalization yet brings cost in practice. Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. It has been broadly believed the resulted sparse neural network cannot be trained from scratch to comparable accuracy. However, several recent works (e.g., [Frankle and Carbin, 2019a]) challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This new pruning paradigm later inspires more new methods of pruning at initialization. In spite of the encouraging progress, how to coordinate these new pruning fashions with the traditional pruning has not been explored yet. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one. With it, we systematically reflect the major differences and new insights brought by these new pruning fashions, with representative works discussed at length. Finally, we summarize the open questions as worthy future directions.

Related papers

Convergence Guarantees of Overparametrized Wide Deep Inverse Prior [1.5362025549031046]
Inverse Priors is an unsupervised approach to transform a random input into an object whose image under the forward model matches the observation. We provide overparametrization bounds under which such network trained via continuous-time gradient descent will converge exponentially fast with high probability. This work is thus a first step towards a theoretical understanding of overparametrized DIP networks, and more broadly it participates to the theoretical understanding of neural networks in inverse problem settings.
arXiv Detail & Related papers (2023-03-20T16:49:40Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks. Our proposed methodology can prune neural networks to respect pre-defined computational budgets. We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
All You Need is a Good Functional Prior for Bayesian Deep Learning [15.10662960548448]
We argue that this is a hugely limiting aspect of Bayesian deep learning. We propose a novel and robust framework to match their prior with the functional prior of neural networks. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements.
arXiv Detail & Related papers (2020-11-25T15:36:16Z)
Progressive Skeletonization: Trimming more fat from a network at initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity. We then propose two approximate procedures to maximize our objective. Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)
A Framework for Neural Network Pruning Using Gibbs Distributions [34.0576955010317]
Gibbs pruning is a novel framework for expressing and designing neural network pruning methods. It can train and prune a network simultaneously in such a way that the learned weights and pruning mask are well-adapted for each other. We achieve a new state-of-the-art result for pruning ResNet-56 with the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-06-08T23:04:53Z)
Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources. For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.