Winning the Lottery Ahead of Time: Efficient Early Network Pruning
- URL: http://arxiv.org/abs/2206.10451v1
- Date: Tue, 21 Jun 2022 14:59:53 GMT
- Title: Winning the Lottery Ahead of Time: Efficient Early Network Pruning
- Authors: John Rachwan, Daniel Z\"ugner, Bertrand Charpentier, Simon Geisler,
Morgane Ayle, Stephan G\"unnemann
- Abstract summary: Pruning, the task of sparsifying deep neural networks, received increasing attention recently.
We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training.
EarlyCroP leads to accuracy comparable to dense training while outperforming pruning baselines.
- Score: 28.832060124537843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning, the task of sparsifying deep neural networks, received increasing
attention recently. Although state-of-the-art pruning methods extract highly
sparse models, they neglect two main challenges: (1) the process of finding
these sparse models is often very expensive; (2) unstructured pruning does not
provide benefits in terms of GPU memory, training time, or carbon emissions. We
propose Early Compression via Gradient Flow Preservation (EarlyCroP), which
efficiently extracts state-of-the-art sparse models before or early in training
addressing challenge (1), and can be applied in a structured manner addressing
challenge (2). This enables us to train sparse networks on commodity GPUs whose
dense versions would be too large, thereby saving costs and reducing hardware
requirements. We empirically show that EarlyCroP outperforms a rich set of
baselines for many tasks (incl. classification, regression) and domains (incl.
computer vision, natural language processing, and reinforcment learning).
EarlyCroP leads to accuracy comparable to dense training while outperforming
pruning baselines.
Related papers
- Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - What to Prune and What Not to Prune at Initialization [0.0]
Post-training dropout based approaches achieve high sparsity.
Initialization pruning is more efficacious when it comes to scaling computation cost of the network.
The goal is to achieve higher sparsity while preserving performance.
arXiv Detail & Related papers (2022-09-06T03:48:10Z) - Pruning Early Exit Networks [14.048989759890475]
We combine two approaches that try to reduce the computational cost while keeping the model performance high: pruning and early exit networks.
We evaluate two approaches of pruning early exit networks: (1) pruning the entire network at once, (2) pruning the base network and additional linear classifiers in an ordered fashion.
arXiv Detail & Related papers (2022-07-08T01:57:52Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Pruning Convolutional Filters using Batch Bridgeout [14.677724755838556]
State-of-the-art computer vision models are rapidly increasing in capacity, where the number of parameters far exceeds the number required to fit the training set.
This results in better optimization and generalization performance.
In order to reduce inference costs, convolutional filters in trained neural networks could be pruned to reduce the run-time memory and computational requirements during inference.
We propose the use of Batch Bridgeout, a sparsity inducing regularization scheme, to train neural networks so that they could be pruned efficiently with minimal degradation in performance.
arXiv Detail & Related papers (2020-09-23T01:51:47Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.