Related papers: One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

URL: http://arxiv.org/abs/2107.02086v1
Date: Mon, 5 Jul 2021 15:27:07 GMT
Title: One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget
Authors: Nathan Hubens and Matei Mancas and Bernard Gosselin and Marius Preda and Titus Zaharia
Abstract summary: Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact. Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance. In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact. Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance. The last two steps are often performed iteratively, leading to reasonable results but also to a time-consuming and complex process. In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle, allowing the model to jointly learn for the optimal weights while being pruned. We do this by introducing a novel pruning schedule, named One-Cycle Pruning, which starts pruning from the beginning of the training, and until its very end. Adopting such a schedule not only leads to better performing pruned models but also drastically reduces the training budget required to prune a model. Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18) and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high sparsity values (80%, 90%, 95% of weights removed). Our results show that One-Cycle Pruning consistently outperforms commonly used pruning schedules such as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a fixed training budget.

Related papers

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining [50.53912352342753]
We propose an integrated enlarge-and-prune pipeline, which combines enlarge model training, pruning, and recovery. We conduct experiments on compressing 2.8B models to 1.3B with up to 2T tokens in pretraining. It demonstrates the integrated approach not only provides insights into the token efficiency of enlarged model pretraining but also achieves superior performance of pruned models.
arXiv Detail & Related papers (2025-03-07T20:35:31Z)
One-cycle Structured Pruning with Stability Driven Structure Search [20.18712941647407]
Existing structured pruning typically involves multi-stage training procedures that often demand heavy computation. We propose an efficient framework for one-cycle structured pruning without compromising model performance. Our method achieves state-of-the-art accuracy while being one of the most efficient pruning frameworks in terms of training time.
arXiv Detail & Related papers (2025-01-23T07:46:48Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods. Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training. We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z)
Structured Pruning for Multi-Task Deep Neural Networks [25.916166808223743]
Multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task models. We investigate the effectiveness of structured pruning on multi-task models.
arXiv Detail & Related papers (2023-04-13T22:15:47Z)
Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z)
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z)
Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning) GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models. Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z)
Block Pruning For Faster Transformers [89.70392810063247]
We introduce a block pruning approach targeting both small and fast models. We find that this approach learns to prune out full components of the underlying model, such as attention heads.
arXiv Detail & Related papers (2021-09-10T12:46:32Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.