Pruning via Iterative Ranking of Sensitivity Statistics
- URL: http://arxiv.org/abs/2006.00896v2
- Date: Sun, 14 Jun 2020 16:41:20 GMT
- Title: Pruning via Iterative Ranking of Sensitivity Statistics
- Authors: Stijn Verdenius, Maarten Stol, Patrick Forr\'e
- Abstract summary: We show that by applying the sensitivity criterion iteratively in smaller steps - still before training - we can improve its performance without difficult implementation.
We then demonstrate how it can be applied for both structured and unstructured pruning, before and/or during training, therewith achieving state-of-the-art sparsity-performance trade-offs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the introduction of SNIP [arXiv:1810.02340v2], it has been demonstrated
that modern neural networks can effectively be pruned before training. Yet, its
sensitivity criterion has since been criticized for not propagating training
signal properly or even disconnecting layers. As a remedy, GraSP
[arXiv:2002.07376v1] was introduced, compromising on simplicity. However, in
this work we show that by applying the sensitivity criterion iteratively in
smaller steps - still before training - we can improve its performance without
difficult implementation. As such, we introduce 'SNIP-it'. We then demonstrate
how it can be applied for both structured and unstructured pruning, before
and/or during training, therewith achieving state-of-the-art
sparsity-performance trade-offs. That is, while already providing the
computational benefits of pruning in the training process from the start.
Furthermore, we evaluate our methods on robustness to overfitting,
disconnection and adversarial attacks as well.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse
Training [58.47622737624532]
We study the influence of pruning criteria on Dynamic Sparse Training (DST) performance.
We find that most of the studied methods yield similar results.
The best performance is predominantly given by the simplest technique: magnitude-based pruning.
arXiv Detail & Related papers (2023-06-21T12:43:55Z) - Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Gradient Descent on Neural Networks Typically Occurs at the Edge of
Stability [94.4070247697549]
Full-batch gradient descent on neural network training objectives operates in a regime we call the Edge of Stability.
In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / text(step size)$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales.
arXiv Detail & Related papers (2021-02-26T22:08:19Z) - Single Shot Structured Pruning Before Training [34.34435316622998]
Our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference.
We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups.
arXiv Detail & Related papers (2020-07-01T11:27:37Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.