Progressive Skeletonization: Trimming more fat from a network at
initialization
- URL: http://arxiv.org/abs/2006.09081v5
- Date: Fri, 19 Mar 2021 13:06:16 GMT
- Title: Progressive Skeletonization: Trimming more fat from a network at
initialization
- Authors: Pau de Jorge, Amartya Sanyal, Harkirat S. Behl, Philip H.S. Torr,
Gregory Rogez, Puneet K. Dokania
- Abstract summary: We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
- Score: 76.11947969140608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that skeletonization (pruning parameters) of
networks \textit{at initialization} provides all the practical benefits of
sparsity both at inference and training time, while only marginally degrading
their performance. However, we observe that beyond a certain level of sparsity
(approx $95\%$), these approaches fail to preserve the network performance, and
to our surprise, in many cases perform even worse than trivial random pruning.
To this end, we propose an objective to find a skeletonized network with
maximum {\em foresight connection sensitivity} (FORCE) whereby the
trainability, in terms of connection sensitivity, of a pruned network is taken
into consideration. We then propose two approximate procedures to maximize our
objective (1) Iterative SNIP: allows parameters that were unimportant at
earlier stages of skeletonization to become important at later stages; and (2)
FORCE: iterative process that allows exploration by allowing already pruned
parameters to resurrect at later stages of skeletonization. Empirical analyses
on a large suite of experiments show that our approach, while providing at
least as good a performance as other recent approaches on moderate pruning
levels, provides remarkably improved performance on higher pruning levels
(could remove up to $99.5\%$ parameters while keeping the networks trainable).
Code can be found in https://github.com/naver/force.
Related papers
- FGGP: Fixed-Rate Gradient-First Gradual Pruning [2.0940682212182975]
We introduce a gradient-first magnitude-next strategy for choosing the parameters to prune, and show that a fixed-rate subselection criterion between these steps works better.
Our proposed fixed-rate gradient-first gradual pruning (FGGP) approach outperforms its state-of-the-art alternatives in most of the above experimental settings.
arXiv Detail & Related papers (2024-11-08T12:02:25Z) - Stochastic Subnetwork Annealing: A Regularization Technique for Fine
Tuning Pruned Subnetworks [4.8951183832371]
Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs.
Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs.
We introduce a novel and effective approach to tuning neuralworks through a regularization technique we call Subnetwork Annealing.
arXiv Detail & Related papers (2024-01-16T21:07:04Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Emerging Paradigms of Neural Network Pruning [82.9322109208353]
Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised.
Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart.
This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
arXiv Detail & Related papers (2021-03-11T05:01:52Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - FlipOut: Uncovering Redundant Weights via Sign Flipping [0.0]
We propose a novel pruning method which uses the oscillations around $0$ that a weight has undergone during training in order to determine its saliency.
Our method can perform pruning before the network has converged, requires little tuning effort, and can directly target the level of sparsity desired by the user.
Our experiments, performed on a variety of object classification architectures, show that it is competitive with existing methods and achieves state-of-the-art performance for levels of sparsity of $99.6%$ and above.
arXiv Detail & Related papers (2020-09-05T20:27:32Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.