Comparing Rewinding and Fine-tuning in Neural Network Pruning
- URL: http://arxiv.org/abs/2003.02389v1
- Date: Thu, 5 Mar 2020 00:53:18 GMT
- Title: Comparing Rewinding and Fine-tuning in Neural Network Pruning
- Authors: Alex Renda, Jonathan Frankle, Michael Carbin
- Abstract summary: We compare fine-tuning and learning rate rewinding to train neural network pruning algorithms.
Both rewinding techniques form the basis of a network-agnostic algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.
- Score: 28.663299059376897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many neural network pruning algorithms proceed in three steps: train the
network to completion, remove unwanted structure to compress the network, and
retrain the remaining structure to recover lost accuracy. The standard
retraining technique, fine-tuning, trains the unpruned weights from their final
trained values using a small fixed learning rate. In this paper, we compare
fine-tuning to alternative retraining techniques. Weight rewinding (as proposed
by Frankle et al., (2019)), rewinds unpruned weights to their values from
earlier in training and retrains them from there using the original training
schedule. Learning rate rewinding (which we propose) trains the unpruned
weights from their final values using the same learning rate schedule as weight
rewinding. Both rewinding techniques outperform fine-tuning, forming the basis
of a network-agnostic pruning algorithm that matches the accuracy and
compression ratios of several more network-specific state-of-the-art
techniques.
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - Pretraining with Random Noise for Fast and Robust Learning without Weight Transport [6.916179672407521]
We show that pretraining neural networks with random noise increases the learning efficiency as well as generalization abilities without weight transport.
Sequential training with both random noise and data brings weights closer to synaptic feedback than training solely with data.
This pre-regularization allows the network to learn simple solutions of a low rank, reducing the generalization loss during subsequent training.
arXiv Detail & Related papers (2024-05-27T00:12:51Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Random Weights Networks Work as Loss Prior Constraint for Image
Restoration [50.80507007507757]
We present our belief Random Weights Networks can be Acted as Loss Prior Constraint for Image Restoration''
Our belief can be directly inserted into existing networks without any training and testing computational cost.
To emphasize, our main focus is to spark the realms of loss function and save their current neglected status.
arXiv Detail & Related papers (2023-03-29T03:43:51Z) - Bit-wise Training of Neural Network Weights [4.56877715768796]
We introduce an algorithm where the individual bits representing the weights of a neural network are learned.
This method allows training weights with integer values on arbitrary bit-depths and naturally uncovers sparse networks.
We show better results than the standard training technique with fully connected networks and similar performance as compared to standard training for convolutional and residual networks.
arXiv Detail & Related papers (2022-02-19T10:46:54Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - RPR: Random Partition Relaxation for Training; Binary and Ternary Weight
Neural Networks [23.45606380793965]
We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values.
We demonstrate binary and ternary-weight networks with accuracies beyond the state-of-the-art for GoogLeNet and competitive performance for ResNet-18 and ResNet-50.
arXiv Detail & Related papers (2020-01-04T15:56:10Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.