An Operator Theoretic Perspective on Pruning Deep Neural Networks
- URL: http://arxiv.org/abs/2110.14856v1
- Date: Thu, 28 Oct 2021 02:33:50 GMT
- Title: An Operator Theoretic Perspective on Pruning Deep Neural Networks
- Authors: William T. Redman, Maria Fonoberova, Ryan Mohr, Ioannis G. Kevrekidis,
Igor Mezic
- Abstract summary: We make use of recent advances in dynamical systems theory to define a new class of theoretically motivated pruning algorithms.
We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods.
- Score: 2.624902795082451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The discovery of sparse subnetworks that are able to perform as well as full
models has found broad applied and theoretical interest. While many pruning
methods have been developed to this end, the na\"ive approach of removing
parameters based on their magnitude has been found to be as robust as more
complex, state-of-the-art algorithms. The lack of theory behind magnitude
pruning's success, especially pre-convergence, and its relation to other
pruning methods, such as gradient based pruning, are outstanding open questions
in the field that are in need of being addressed. We make use of recent
advances in dynamical systems theory, namely Koopman operator theory, to define
a new class of theoretically motivated pruning algorithms. We show that these
algorithms can be equivalent to magnitude and gradient based pruning, unifying
these seemingly disparate methods, and that they can be used to shed light on
magnitude pruning's performance during early training.
Related papers
- The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles.
We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss.
We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - The rise of the lottery heroes: why zero-shot pruning is hard [3.1473798197405944]
Recent advances in deep learning optimization showed that just a subset of parameters are really necessary to successfully train a model.
Finding these trainable sub-networks is a typically costly process.
This inhibits practical applications: can the learned sub-graph structures in deep learning models be found at training time?
arXiv Detail & Related papers (2022-02-24T22:49:36Z) - Do Proportionate Algorithms Exploit Sparsity? [13.303007365477862]
This paper addresses the unexplored drawbacks and limitations of using proportionate-type algorithms.
Our findings include the theoretical justification for the poor performance of these algorithms in several sparse scenarios, and also when dealing with non-stationary and compressible systems.
arXiv Detail & Related papers (2021-08-16T00:56:00Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Emerging Paradigms of Neural Network Pruning [82.9322109208353]
Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised.
Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart.
This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
arXiv Detail & Related papers (2021-03-11T05:01:52Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.