Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
- URL: http://arxiv.org/abs/2308.02060v2
- Date: Fri, 8 Sep 2023 14:45:48 GMT
- Title: Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
- Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar,
Alexandra Peste, Dan Alistarh
- Abstract summary: We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
- Score: 87.90654868505518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining versions of deep neural networks that are both highly-accurate and
highly-sparse is one of the main challenges in the area of model compression,
and several high-performance pruning techniques have been investigated by the
community. Yet, much less is known about the interaction between sparsity and
the standard stochastic optimization techniques used for training sparse
networks, and most existing work uses standard dense schedules and
hyperparameters for training sparse networks. In this work, we examine the
impact of high sparsity on model training using the standard computer vision
and natural language processing sparsity benchmarks. We begin by showing that
using standard dense training recipes for sparse training is suboptimal, and
results in under-training. We provide new approaches for mitigating this issue
for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and
sparse fine-tuning of language models (e.g. BERT/GLUE), achieving
state-of-the-art results in both settings in the high-sparsity regime, and
providing detailed analyses for the difficulty of sparse training in both
scenarios. Our work sets a new threshold in terms of the accuracies that can be
achieved under high sparsity, and should inspire further research into
improving sparse model training, to reach higher accuracies under high
sparsity, but also to do so efficiently.
Related papers
- Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems [16.89998201009075]
Ensembling deep neural networks has shown promise in improving generalization performance.
We propose low precision ensembling, where ensemble members are derived from a single model within low precision number systems.
Our empirical analysis demonstrates the effectiveness of our proposed low precision ensembling method compared to existing ensemble approaches.
arXiv Detail & Related papers (2024-11-22T11:18:20Z) - Training Bayesian Neural Networks with Sparse Subspace Variational
Inference [35.241207717307645]
Sparse Subspace Variational Inference (SSVI) is the first fully sparse BNN framework that maintains a consistently highly sparse model throughout the training and inference phases.
Our experiments show that SSVI sets new benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20x compression in model size with under 3% performance drop, and up to 20x FLOPs reduction during training compared with dense VI training.
arXiv Detail & Related papers (2024-02-16T19:15:49Z) - Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - LaplaceNet: A Hybrid Energy-Neural Model for Deep Semi-Supervised
Classification [0.0]
Recent developments in deep semi-supervised classification have reached unprecedented performance.
We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity.
Our model outperforms state-of-the-art methods for deep semi-supervised classification, over several benchmark datasets.
arXiv Detail & Related papers (2021-06-08T17:09:28Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.