Why is Pruning at Initialization Immune to Reinitializing and Shuffling?
- URL: http://arxiv.org/abs/2107.01808v1
- Date: Mon, 5 Jul 2021 06:04:56 GMT
- Title: Why is Pruning at Initialization Immune to Reinitializing and Shuffling?
- Authors: Sahib Singh, Rosanne Liu
- Abstract summary: Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding.
Under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
- Score: 10.196185472801236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies assessing the efficacy of pruning neural networks methods
uncovered a surprising finding: when conducting ablation studies on existing
pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude
pruning, performances of these methods remain unchanged and sometimes even
improve when randomly shuffling the mask positions within each layer (Layerwise
Shuffling) or sampling new initial weight values (Reinit), while keeping
pruning masks the same. We attempt to understand the reason behind such network
immunity towards weight/mask modifications, by studying layer-wise statistics
before and after randomization operations. We found that under each of the
pruning-at-initialization methods, the distribution of unpruned weights changed
minimally with randomization operations.
Related papers
- Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference.
Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference.
GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z) - What to Prune and What Not to Prune at Initialization [0.0]
Post-training dropout based approaches achieve high sparsity.
Initialization pruning is more efficacious when it comes to scaling computation cost of the network.
The goal is to achieve higher sparsity while preserving performance.
arXiv Detail & Related papers (2022-09-06T03:48:10Z) - Weighting and Pruning based Ensemble Deep Random Vector Functional Link
Network for Tabular Data Classification [3.1905745371064484]
We propose novel variants of Ensemble Deep Random Vector Functional Link (edRVFL)
Weighting edRVFL (WedRVFL) uses weighting methods to give training samples different weights in different layers according to how the samples were classified confidently in the previous layer thereby increasing the ensemble's diversity and accuracy.
A pruning-based edRVFL (PedRVFL) has also been proposed. We prune some inferior neurons based on their importance for classification before generating the next hidden layer.
arXiv Detail & Related papers (2022-01-15T09:34:50Z) - Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded
learning [16.526326919313924]
We study an approach to learning pruning masks by optimizing the expected loss of pruning masks.
We analyze the training dynamics of the inducedadaptive predictor in the setting of linear regression.
We show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
arXiv Detail & Related papers (2021-10-22T14:25:22Z) - Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls
for Network Pruning [73.79377854107514]
We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning.
We demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity.
We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning.
arXiv Detail & Related papers (2021-03-19T04:41:40Z) - Pruning Neural Networks at Initialization: Why are We Missing the Mark? [43.7335598007065]
We assess proposals for pruning neural networks at an early stage.
We show that, unlike pruning after training, randomly shuffling the weights preserves or improves accuracy.
This property suggests broader challenges with the underlying prunings, the desire to prune at an early stage, or both.
arXiv Detail & Related papers (2020-09-18T01:13:38Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.