When to Prune? A Policy towards Early Structural Pruning
- URL: http://arxiv.org/abs/2110.12007v1
- Date: Fri, 22 Oct 2021 18:39:22 GMT
- Title: When to Prune? A Policy towards Early Structural Pruning
- Authors: Maying Shen, Pavlo Molchanov, Hongxu Yin, Jose M. Alvarez
- Abstract summary: We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
- Score: 27.91996628143805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pruning enables appealing reductions in network memory footprint and time
complexity. Conventional post-training pruning techniques lean towards
efficient inference while overlooking the heavy computation for training.
Recent exploration of pre-training pruning at initialization hints on training
cost reduction via pruning, but suffers noticeable performance degradation. We
attempt to combine the benefits of both directions and propose a policy that
prunes as early as possible during training without hurting performance.
Instead of pruning at initialization, our method exploits initial dense
training for few epochs to quickly guide the architecture, while constantly
evaluating dominant sub-networks via neuron importance ranking. This unveils
dominant sub-networks whose structures turn stable, allowing conventional
pruning to be pushed earlier into the training. To do this early, we further
introduce an Early Pruning Indicator (EPI) that relies on sub-network
architectural similarity and quickly triggers pruning when the sub-network's
architecture stabilizes. Through extensive experiments on ImageNet, we show
that EPI empowers a quick tracking of early training epochs suitable for
pruning, offering same efficacy as an otherwise ``oracle'' grid-search that
scans through epochs and requires orders of magnitude more compute. Our method
yields $1.4\%$ top-1 accuracy boost over state-of-the-art pruning counterparts,
cuts down training cost on GPU by $2.4\times$, hence offers a new
efficiency-accuracy boundary for network pruning during training.
Related papers
- DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference.
Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training.
We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z) - Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Single Shot Structured Pruning Before Training [34.34435316622998]
Our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference.
We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups.
arXiv Detail & Related papers (2020-07-01T11:27:37Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z) - Pruning via Iterative Ranking of Sensitivity Statistics [0.0]
We show that by applying the sensitivity criterion iteratively in smaller steps - still before training - we can improve its performance without difficult implementation.
We then demonstrate how it can be applied for both structured and unstructured pruning, before and/or during training, therewith achieving state-of-the-art sparsity-performance trade-offs.
arXiv Detail & Related papers (2020-06-01T12:48:53Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.