Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon?
- URL: http://arxiv.org/abs/2102.11068v1
- Date: Fri, 19 Feb 2021 14:49:46 GMT
- Title: Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon?
- Authors: Ning Liu, Geng Yuan, Zhengping Che, Xuan Shen, Xiaolong Ma, Qing Jin,
Jian Ren, Jian Tang, Sijia Liu, Yanzhi Wang
- Abstract summary: In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
- Score: 43.47794674403988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep model compression, the recent finding "Lottery Ticket Hypothesis"
(LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning
ticket (i.e., a properly pruned sub-network together with original weight
initialization) that can achieve competitive performance than the original
dense network. However, it is not easy to observe such winning property in many
scenarios, where for example, a relatively large learning rate is used even if
it benefits training the original dense model. In this work, we investigate the
underlying condition and rationale behind the winning property, and find that
the underlying reason is largely attributed to the correlation between
initialized weights and final-trained weights when the learning rate is not
sufficiently large. Thus, the existence of winning property is correlated with
an insufficient DNN pretraining, and is unlikely to occur for a well-trained
DNN. To overcome this limitation, we propose the "pruning & fine-tuning" method
that consistently outperforms lottery ticket sparse training under the same
pruning algorithm and the same total training epochs. Extensive experiments
over multiple deep models (VGG, ResNet, MobileNet-v2) on different datasets
have been conducted to justify our proposals.
Related papers
- LOFT: Finding Lottery Tickets through Filter-wise Training [15.06694204377327]
We show how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms.
We present the emphLOttery ticket through Filter-wise Training algorithm, dubbed as textscLoFT.
Experiments show that textscLoFT $i)$ preserves and finds good lottery tickets, while $ii)$ achieves it non-trivial and communication savings.
arXiv Detail & Related papers (2022-10-28T14:43:42Z) - Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets
Win [20.97456178983006]
Lottery ticket hypothesis states that sparseworks exist in randomly dense networks that can be trained to the same accuracy as the dense network they reside in.
We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization.
arXiv Detail & Related papers (2021-06-13T10:06:06Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.