Back to Basics: Efficient Network Compression via IMP
- URL: http://arxiv.org/abs/2111.00843v1
- Date: Mon, 1 Nov 2021 11:23:44 GMT
- Title: Back to Basics: Efficient Network Compression via IMP
- Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta
- Abstract summary: Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
- Score: 22.586474627159287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning is a widely used technique for effectively compressing Deep
Neural Networks with little to no degradation in performance during inference.
Iterative Magnitude Pruning (IMP) is one of the most established approaches for
network pruning, consisting of several iterative training and pruning steps,
where a significant amount of the network's performance is lost after pruning
and then recovered in the subsequent retraining phase. While commonly used as a
benchmark reference, it is often argued that a) it reaches suboptimal states by
not incorporating sparsification into the training phase, b) its global
selection criterion fails to properly determine optimal layer-wise pruning
rates and c) its iterative nature makes it slow and non-competitive. In light
of recently proposed retraining techniques, we investigate these claims through
rigorous and consistent experiments where we compare IMP to
pruning-during-training algorithms, evaluate proposed modifications of its
selection criterion and study the number of iterations and total training time
actually required. We find that IMP with SLR for retraining can outperform
state-of-the-art pruning-during-training approaches without or with only little
computational overhead, that the global magnitude selection criterion is
largely competitive with more complex approaches and that only few retraining
epochs are needed in practice to achieve most of the sparsity-vs.-performance
tradeoff of IMP. Our goals are both to demonstrate that basic IMP can already
provide state-of-the-art pruning results on par with or even outperforming more
complex or heavily parameterized approaches and also to establish a more
realistic yet easily realisable baseline for future research.
Related papers
- Efficient Training of Deep Neural Operator Networks via Randomized Sampling [0.0]
Deep operator network (DeepNet) has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications.
We introduce a random sampling technique to be adopted the training of DeepONet, aimed at improving generalization ability of the model, while significantly reducing computational time.
Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.
arXiv Detail & Related papers (2024-09-20T07:18:31Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse
Training [58.47622737624532]
We study the influence of pruning criteria on Dynamic Sparse Training (DST) performance.
We find that most of the studied methods yield similar results.
The best performance is predominantly given by the simplest technique: magnitude-based pruning.
arXiv Detail & Related papers (2023-06-21T12:43:55Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Pruning Filters while Training for Efficiently Optimizing Deep Learning
Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks.
We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself.
Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.