Related papers: Progressive Skeletonization: Trimming more fat from a network at initialization

Progressive Skeletonization: Trimming more fat from a network at initialization

URL: http://arxiv.org/abs/2006.09081v5
Date: Fri, 19 Mar 2021 13:06:16 GMT
Title: Progressive Skeletonization: Trimming more fat from a network at initialization
Authors: Pau de Jorge, Amartya Sanyal, Harkirat S. Behl, Philip H.S. Torr, Gregory Rogez, Puneet K. Dokania
Abstract summary: We propose an objective to find a skeletonized network with maximum connection sensitivity. We then propose two approximate procedures to maximize our objective. Our approach provides remarkably improved performance on higher pruning levels.
Score: 76.11947969140608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that skeletonization (pruning parameters) of networks \textit{at initialization} provides all the practical benefits of sparsity both at inference and training time, while only marginally degrading their performance. However, we observe that beyond a certain level of sparsity (approx $95\%$), these approaches fail to preserve the network performance, and to our surprise, in many cases perform even worse than trivial random pruning. To this end, we propose an objective to find a skeletonized network with maximum {\em foresight connection sensitivity} (FORCE) whereby the trainability, in terms of connection sensitivity, of a pruned network is taken into consideration. We then propose two approximate procedures to maximize our objective (1) Iterative SNIP: allows parameters that were unimportant at earlier stages of skeletonization to become important at later stages; and (2) FORCE: iterative process that allows exploration by allowing already pruned parameters to resurrect at later stages of skeletonization. Empirical analyses on a large suite of experiments show that our approach, while providing at least as good a performance as other recent approaches on moderate pruning levels, provides remarkably improved performance on higher pruning levels (could remove up to $99.5\%$ parameters while keeping the networks trainable). Code can be found in https://github.com/naver/force.

Related papers

FGGP: Fixed-Rate Gradient-First Gradual Pruning [2.0940682212182975]
We introduce a gradient-first magnitude-next strategy for choosing the parameters to prune, and show that a fixed-rate subselection criterion between these steps works better. Our proposed fixed-rate gradient-first gradual pruning (FGGP) approach outperforms its state-of-the-art alternatives in most of the above experimental settings.
arXiv Detail & Related papers (2024-11-08T12:02:25Z)
Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks [4.8951183832371]
Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs. Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs. We introduce a novel and effective approach to tuning neuralworks through a regularization technique we call Subnetwork Annealing.
arXiv Detail & Related papers (2024-01-16T21:07:04Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network. Current methods are insufficient to enable this optimization and lead to a large degradation in model performance. We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
Emerging Paradigms of Neural Network Pruning [82.9322109208353]
Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
arXiv Detail & Related papers (2021-03-11T05:01:52Z)
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks. We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well. We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
FlipOut: Uncovering Redundant Weights via Sign Flipping [0.0]
We propose a novel pruning method which uses the oscillations around $0$ that a weight has undergone during training in order to determine its saliency. Our method can perform pruning before the network has converged, requires little tuning effort, and can directly target the level of sparsity desired by the user. Our experiments, performed on a variety of object classification architectures, show that it is competitive with existing methods and achieves state-of-the-art performance for levels of sparsity of $99.6%$ and above.
arXiv Detail & Related papers (2020-09-05T20:27:32Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.