Related papers: Why is Pruning at Initialization Immune to Reinitializing and Shuffling?

Why is Pruning at Initialization Immune to Reinitializing and Shuffling?

URL: http://arxiv.org/abs/2107.01808v1
Date: Mon, 5 Jul 2021 06:04:56 GMT
Title: Why is Pruning at Initialization Immune to Reinitializing and Shuffling?
Authors: Sahib Singh, Rosanne Liu
Abstract summary: Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding. Under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
Score: 10.196185472801236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding: when conducting ablation studies on existing pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude pruning, performances of these methods remain unchanged and sometimes even improve when randomly shuffling the mask positions within each layer (Layerwise Shuffling) or sampling new initial weight values (Reinit), while keeping pruning masks the same. We attempt to understand the reason behind such network immunity towards weight/mask modifications, by studying layer-wise statistics before and after randomization operations. We found that under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.

Related papers

SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence. We transform the discrete selection problem into a continuous subset optimization framework. We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z)
Mutual Information Preserving Neural Network Pruning [3.7414804164475983]
We introduce Mutual Information Preserving Pruning (MIPP), a structured activation-based pruning technique applicable before or after training. MIPP consistently outperforms state-of-the-art methods, regardless of whether pruning is performed before or after training.
arXiv Detail & Related papers (2024-10-31T18:50:15Z)
Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training. We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z)
What to Prune and What Not to Prune at Initialization [0.0]
Post-training dropout based approaches achieve high sparsity. Initialization pruning is more efficacious when it comes to scaling computation cost of the network. The goal is to achieve higher sparsity while preserving performance.
arXiv Detail & Related papers (2022-09-06T03:48:10Z)
Weighting and Pruning based Ensemble Deep Random Vector Functional Link Network for Tabular Data Classification [3.1905745371064484]
We propose novel variants of Ensemble Deep Random Vector Functional Link (edRVFL) Weighting edRVFL (WedRVFL) uses weighting methods to give training samples different weights in different layers according to how the samples were classified confidently in the previous layer thereby increasing the ensemble's diversity and accuracy. A pruning-based edRVFL (PedRVFL) has also been proposed. We prune some inferior neurons based on their importance for classification before generating the next hidden layer.
arXiv Detail & Related papers (2022-01-15T09:34:50Z)
Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning [16.526326919313924]
We study an approach to learning pruning masks by optimizing the expected loss of pruning masks. We analyze the training dynamics of the inducedadaptive predictor in the setting of linear regression. We show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
arXiv Detail & Related papers (2021-10-22T14:25:22Z)
Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning [73.79377854107514]
We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning. We demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity. We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning.
arXiv Detail & Related papers (2021-03-19T04:41:40Z)
Pruning Neural Networks at Initialization: Why are We Missing the Mark? [43.7335598007065]
We assess proposals for pruning neural networks at an early stage. We show that, unlike pruning after training, randomly shuffling the weights preserves or improves accuracy. This property suggests broader challenges with the underlying prunings, the desire to prune at an early stage, or both.
arXiv Detail & Related papers (2020-09-18T01:13:38Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.