Related papers: Single Shot Structured Pruning Before Training

Single Shot Structured Pruning Before Training

URL: http://arxiv.org/abs/2007.00389v1
Date: Wed, 1 Jul 2020 11:27:37 GMT
Title: Single Shot Structured Pruning Before Training
Authors: Joost van Amersfoort, Milad Alizadeh, Sebastian Farquhar, Nicholas Lane, Yarin Gal
Abstract summary: Our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference. We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups.
Score: 34.34435316622998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training. Unlike previous works on pruning before training which prune individual weights, our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference. We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups. Our method is fast, easy to implement, and needs just one forward/backward pass on a single batch of data to complete pruning before training begins.

Related papers

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models [3.052154851421859]
Early exits significantly reduce the amount of computation required in deep neural networks. Most early exit methods employ a training strategy that either simultaneously trains the backbone network and the exit heads or trains the exit heads separately. We propose a training approach where the backbone is initially trained on its own, followed by a phase where both the backbone and the exit heads are trained together.
arXiv Detail & Related papers (2024-07-19T13:56:57Z)
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. We propose to exploit the interior building blocks of the model to improve efficiency. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z)
Efficient Adversarial Training with Robust Early-Bird Tickets [57.72115485770303]
We find that robust connectivity patterns emerge in the early training phase, far before parameters converge. Inspired by this finding, we dig out robust early-bird tickets to develop an efficient adversarial training method. Experiments show that the proposed efficient adversarial training method can achieve up to $7times sim 13 times$ training speedups.
arXiv Detail & Related papers (2022-11-14T10:44:25Z)
Neural Network Panning: Screening the Optimal Sparse Network Before Training [15.349144733875368]
We argue that network pruning can be summarized as an expressive force transfer process of weights. We propose a pruning scheme before training called Neural Network Panning which guides expressive force transfer through multi-index and multi-process steps.
arXiv Detail & Related papers (2022-09-27T13:31:43Z)
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z)
When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance. Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Pruning via Iterative Ranking of Sensitivity Statistics [0.0]
We show that by applying the sensitivity criterion iteratively in smaller steps - still before training - we can improve its performance without difficult implementation. We then demonstrate how it can be applied for both structured and unstructured pruning, before and/or during training, therewith achieving state-of-the-art sparsity-performance trade-offs.
arXiv Detail & Related papers (2020-06-01T12:48:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.