Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
- URL: http://arxiv.org/abs/2106.10404v1
- Date: Sat, 19 Jun 2021 02:09:25 GMT
- Title: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
- Authors: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin,
Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin
Mocanu
- Abstract summary: We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
- Score: 79.78184026678659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Works on lottery ticket hypothesis (LTH) and single-shot network pruning
(SNIP) have raised a lot of attention currently on post-training pruning
(iterative magnitude pruning), and before-training pruning (pruning at
initialization). The former method suffers from an extremely large computation
cost and the latter category of methods usually struggles with insufficient
performance. In comparison, during-training pruning, a class of pruning methods
that simultaneously enjoys the training/inference efficiency and the comparable
performance, temporarily, has been less explored. To better understand
during-training pruning, we quantitatively study the effect of pruning
throughout training from the perspective of pruning plasticity (the ability of
the pruned networks to recover the original performance). Pruning plasticity
can help explain several other empirical observations about neural network
pruning in literature. We further find that pruning plasticity can be
substantially improved by injecting a brain-inspired mechanism called
neuroregeneration, i.e., to regenerate the same number of connections as
pruned. Based on the insights from pruning plasticity, we design a novel
gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost
neuroregeneration (GraNet), and its dynamic sparse training (DST) variant
(GraNet-ST). Both of them advance state of the art. Perhaps most impressively,
the latter for the first time boosts the sparse-to-sparse training performance
over various dense-to-sparse methods by a large margin with ResNet-50 on
ImageNet. We will release all codes.
Related papers
- Learning effective pruning at initialization from iterative pruning [15.842658282636876]
We present an end-to-end neural network-based PaI method to reduce training costs.
Our approach outperforms existing methods in high-sparsity settings.
As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach.
arXiv Detail & Related papers (2024-08-27T03:17:52Z) - DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference.
Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training.
We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.