One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget
- URL: http://arxiv.org/abs/2107.02086v1
- Date: Mon, 5 Jul 2021 15:27:07 GMT
- Title: One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget
- Authors: Nathan Hubens and Matei Mancas and Bernard Gosselin and Marius Preda
and Titus Zaharia
- Abstract summary: Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact.
Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance.
In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Introducing sparsity in a neural network has been an efficient way to reduce
its complexity while keeping its performance almost intact. Most of the time,
sparsity is introduced using a three-stage pipeline: 1) train the model to
convergence, 2) prune the model according to some criterion, 3) fine-tune the
pruned model to recover performance. The last two steps are often performed
iteratively, leading to reasonable results but also to a time-consuming and
complex process. In our work, we propose to get rid of the first step of the
pipeline and to combine the two other steps in a single pruning-training cycle,
allowing the model to jointly learn for the optimal weights while being pruned.
We do this by introducing a novel pruning schedule, named One-Cycle Pruning,
which starts pruning from the beginning of the training, and until its very
end. Adopting such a schedule not only leads to better performing pruned models
but also drastically reduces the training budget required to prune a model.
Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18)
and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high
sparsity values (80%, 90%, 95% of weights removed). Our results show that
One-Cycle Pruning consistently outperforms commonly used pruning schedules such
as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a
fixed training budget.
Related papers
- Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference.
Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training.
We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z) - Structured Pruning for Multi-Task Deep Neural Networks [25.916166808223743]
Multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task models.
We investigate the effectiveness of structured pruning on multi-task models.
arXiv Detail & Related papers (2023-04-13T22:15:47Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning)
GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models.
Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z) - Block Pruning For Faster Transformers [89.70392810063247]
We introduce a block pruning approach targeting both small and fast models.
We find that this approach learns to prune out full components of the underlying model, such as attention heads.
arXiv Detail & Related papers (2021-09-10T12:46:32Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning.
We propose the use of movement pruning, a simple, deterministic first-order weight pruning method.
Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.