Gradient-based Weight Density Balancing for Robust Dynamic Sparse
Training
- URL: http://arxiv.org/abs/2210.14012v1
- Date: Tue, 25 Oct 2022 13:32:09 GMT
- Title: Gradient-based Weight Density Balancing for Robust Dynamic Sparse
Training
- Authors: Mathias Parger, Alexander Ertl, Paul Eibensteiner, Joerg H. Mueller,
Martin Winter, Markus Steinberger
- Abstract summary: Training a sparse neural network from scratch requires optimizing connections at the same time as the connections themselves.
While the connections per layer are optimized multiple times during training, the density of each layer typically remains constant.
We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most.
- Score: 59.48691524227352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a sparse neural network from scratch requires optimizing connections
at the same time as the weights themselves. Typically, the weights are
redistributed after a predefined number of weight updates, removing a fraction
of the parameters of each layer and inserting them at different locations in
the same layers. The density of each layer is determined using heuristics,
often purely based on the size of the parameter tensor. While the connections
per layer are optimized multiple times during training, the density of each
layer typically remains constant. This leaves great unrealized potential,
especially in scenarios with a high sparsity of 90% and more. We propose Global
Gradient-based Redistribution, a technique which distributes weights across all
layers - adding more weights to the layers that need them most. Our evaluation
shows that our approach is less prone to unbalanced weight distribution at
initialization than previous work and that it is able to find better performing
sparse subnetworks at very high sparsity levels.
Related papers
- MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Iterative Training: Finding Binary Weight Deep Neural Networks with
Layer Binarization [0.0]
In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired.
Recent work in weight binarization replaces weight-input matrix multiplication with additions.
We show empirically that, starting from partial binary weights instead of from fully binary ones, training reaches fully binary weight networks with better accuracies.
arXiv Detail & Related papers (2021-11-13T05:36:51Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score.
LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Train-by-Reconnect: Decoupling Locations of Weights from their Values [6.09170287691728]
We show that untrained deep neural networks (DNNs) are different from trained ones.
We propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights.
When the initial weights share a single value, our method finds weight neural network with far better-than-chance accuracy.
arXiv Detail & Related papers (2020-03-05T12:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.