Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural
Networks by Pruning A Randomly Weighted Network
- URL: http://arxiv.org/abs/2103.09377v1
- Date: Wed, 17 Mar 2021 00:31:24 GMT
- Title: Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural
Networks by Pruning A Randomly Weighted Network
- Authors: James Diffenderfer, Bhavya Kailkhura
- Abstract summary: We propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets.
Our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively.
- Score: 13.193734014710582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Frankle & Carbin (2019) demonstrated that randomly-initialized
dense networks contain subnetworks that once found can be trained to reach test
accuracy comparable to the trained dense network. However, finding these high
performing trainable subnetworks is expensive, requiring iterative process of
training and pruning weights. In this paper, we propose (and prove) a stronger
Multi-Prize Lottery Ticket Hypothesis:
A sufficiently over-parameterized neural network with random weights contains
several subnetworks (winning tickets) that (a) have comparable accuracy to a
dense target network with learned weights (prize 1), (b) do not require any
further training to achieve prize 1 (prize 2), and (c) is robust to extreme
forms of quantization (i.e., binary weights and/or activation) (prize 3).
This provides a new paradigm for learning compact yet highly accurate binary
neural networks simply by pruning and quantizing randomly weighted full
precision neural networks. We also propose an algorithm for finding multi-prize
tickets (MPTs) and test it by performing a series of experiments on CIFAR-10
and ImageNet datasets. Empirical results indicate that as models grow deeper
and wider, multi-prize tickets start to reach similar (and sometimes even
higher) test accuracy compared to their significantly larger and full-precision
counterparts that have been weight-trained. Without ever updating the weight
values, our MPTs-1/32 not only set new binary weight network state-of-the-art
(SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also
outperform their full-precision counterparts by 1.78% and 0.76%, respectively.
Further, our MPT-1/1 achieves SOTA Top-1 accuracy (91.9%) for binary neural
networks on CIFAR-10. Code and pre-trained models are available at:
https://github.com/chrundle/biprop.
Related papers
- WeightMom: Learning Sparse Networks using Iterative Momentum-based
pruning [0.0]
We propose a weight based pruning approach in which the weights are pruned gradually based on their momentum of the previous iterations.
We evaluate our approach on networks such as AlexNet, VGG16 and ResNet50 with image classification datasets such as CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-08-11T07:13:59Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Bit-wise Training of Neural Network Weights [4.56877715768796]
We introduce an algorithm where the individual bits representing the weights of a neural network are learned.
This method allows training weights with integer values on arbitrary bit-depths and naturally uncovers sparse networks.
We show better results than the standard training technique with fully connected networks and similar performance as compared to standard training for convolutional and residual networks.
arXiv Detail & Related papers (2022-02-19T10:46:54Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Proving the Lottery Ticket Hypothesis: Pruning is All You Need [56.25432563818297]
The lottery ticket hypothesis states that a randomly-d network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.
We prove an even stronger hypothesis, showing that for every bounded distribution and every target network with bounded weights, a sufficiently over- parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training.
arXiv Detail & Related papers (2020-02-03T07:23:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.