HyperSparse Neural Networks: Shifting Exploration to Exploitation
through Adaptive Regularization
- URL: http://arxiv.org/abs/2308.07163v2
- Date: Wed, 16 Aug 2023 06:52:21 GMT
- Title: HyperSparse Neural Networks: Shifting Exploration to Exploitation
through Adaptive Regularization
- Authors: Patrick Glandorf and Timo Kaiser and Bodo Rosenhahn
- Abstract summary: Sparse neural networks are a key factor in developing resource-efficient machine learning applications.
We propose the novel and powerful sparse learning method Adaptive Regularized Training (ART) to compress dense into sparse networks.
Our method compresses the pre-trained model knowledge into the weights of highest magnitude.
- Score: 18.786142528591355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse neural networks are a key factor in developing resource-efficient
machine learning applications. We propose the novel and powerful sparse
learning method Adaptive Regularized Training (ART) to compress dense into
sparse networks. Instead of the commonly used binary mask during training to
reduce the number of model weights, we inherently shrink weights close to zero
in an iterative manner with increasing weight regularization. Our method
compresses the pre-trained model knowledge into the weights of highest
magnitude. Therefore, we introduce a novel regularization loss named
HyperSparse that exploits the highest weights while conserving the ability of
weight exploration. Extensive experiments on CIFAR and TinyImageNet show that
our method leads to notable performance gains compared to other sparsification
methods, especially in extremely high sparsity regimes up to 99.8 percent model
sparsity. Additional investigations provide new insights into the patterns that
are encoded in weights with high magnitudes.
Related papers
- Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - Calibrating the Rigged Lottery: Making All Tickets Reliable [14.353428281239665]
We propose a new sparse training method to produce sparse models with improved confidence calibration.
Our method simultaneously maintains or even improves accuracy with only a slight increase in computation and storage burden.
arXiv Detail & Related papers (2023-02-18T15:53:55Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.