Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients
- URL: http://arxiv.org/abs/2202.08132v1
- Date: Wed, 16 Feb 2022 15:18:55 GMT
- Title: Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients
- Authors: Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van
Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, Yarin Gal
- Abstract summary: Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
- Score: 36.078414964088196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning neural networks at initialization would enable us to find sparse
models that retain the accuracy of the original network while consuming fewer
computational resources for training and inference. However, current methods
are insufficient to enable this optimization and lead to a large degradation in
model performance. In this paper, we identify a fundamental limitation in the
formulation of current methods, namely that their saliency criteria look at a
single step at the start of training without taking into account the
trainability of the network. While pruning iteratively and gradually has been
shown to improve pruning performance, explicit consideration of the training
stage that will immediately follow pruning has so far been absent from the
computation of the saliency criterion. To overcome the short-sightedness of
existing methods, we propose Prospect Pruning (ProsPr), which uses
meta-gradients through the first few steps of optimization to determine which
weights to prune. ProsPr combines an estimate of the higher-order effects of
pruning on the loss and the optimization trajectory to identify the trainable
sub-network. Our method achieves state-of-the-art pruning performance on a
variety of vision classification tasks, with less data and in a single shot
compared to existing pruning-at-initialization methods.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - A Unified Framework for Soft Threshold Pruning [27.853698217792456]
We reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA)
We derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework.
In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD.
arXiv Detail & Related papers (2023-02-25T08:16:14Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets.
We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.