Related papers: Pruning Neural Networks at Initialization: Why are We Missing the Mark?

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

URL: http://arxiv.org/abs/2009.08576v2
Date: Sun, 21 Mar 2021 21:38:32 GMT
Title: Pruning Neural Networks at Initialization: Why are We Missing the Mark?
Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin
Abstract summary: We assess proposals for pruning neural networks at an early stage. We show that, unlike pruning after training, randomly shuffling the weights preserves or improves accuracy. This property suggests broader challenges with the underlying prunings, the desire to prune at an early stage, or both.
Score: 43.7335598007065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why. We show that, unlike pruning after training, randomly shuffling the weights these methods prune within each layer or sampling new initial values preserves or improves accuracy. As such, the per-weight pruning decisions made by these methods can be replaced by a per-layer choice of the fraction of weights to prune. This property suggests broader challenges with the underlying pruning heuristics, the desire to prune at initialization, or both.

Related papers

Learning effective pruning at initialization from iterative pruning [15.842658282636876]
We present an end-to-end neural network-based PaI method to reduce training costs. Our approach outperforms existing methods in high-sparsity settings. As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach.
arXiv Detail & Related papers (2024-08-27T03:17:52Z)
Class-Aware Pruning for Efficient Neural Networks [5.918784236241883]
Pruning has been introduced to reduce the computational cost in executing deep neural networks (DNNs) In this paper, we propose a class-aware pruning technique to compress DNNs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs.
arXiv Detail & Related papers (2023-12-10T13:07:54Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
What to Prune and What Not to Prune at Initialization [0.0]
Post-training dropout based approaches achieve high sparsity. Initialization pruning is more efficacious when it comes to scaling computation cost of the network. The goal is to achieve higher sparsity while preserving performance.
arXiv Detail & Related papers (2022-09-06T03:48:10Z)
Why is Pruning at Initialization Immune to Reinitializing and Shuffling? [10.196185472801236]
Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding. Under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
arXiv Detail & Related papers (2021-07-05T06:04:56Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning [73.79377854107514]
We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning. We demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity. We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning.
arXiv Detail & Related papers (2021-03-19T04:41:40Z)
Emerging Paradigms of Neural Network Pruning [82.9322109208353]
Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
arXiv Detail & Related papers (2021-03-11T05:01:52Z)
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot [55.37967301483917]
Conventional wisdom of pruning algorithms suggests that pruning methods exploit information from training data to find goodworks. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods. We propose a series of simple emphdata-independent prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork.
arXiv Detail & Related papers (2020-09-22T17:36:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.