Good Students Play Big Lottery Better
- URL: http://arxiv.org/abs/2101.03255v2
- Date: Mon, 18 Jan 2021 07:25:16 GMT
- Title: Good Students Play Big Lottery Better
- Authors: Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie,
Zhangyang Wang
- Abstract summary: Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
- Score: 84.6111281091602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lottery ticket hypothesis suggests that a dense neural network contains a
sparse sub-network that can match the test accuracy of the original dense net
when trained in isolation from (the same) random initialization. However, the
hypothesis failed to generalize to larger dense networks such as ResNet-50. As
a remedy, recent studies demonstrate that a sparse sub-network can still be
obtained by using a rewinding technique, which is to re-train it from
early-phase training weights or learning rates of the dense model, rather than
from random initialization.
Is rewinding the only or the best way to scale up lottery tickets? This paper
proposes a new, simpler and yet powerful technique for re-training the
sub-network, called "Knowledge Distillation ticket" (KD ticket). Rewinding
exploits the value of inheriting knowledge from the early training phase to
improve lottery tickets in large networks. In comparison, KD ticket addresses a
complementary possibility - inheriting useful knowledge from the late training
phase of the dense model. It is achieved by leveraging the soft labels
generated by the trained dense model to re-train the sub-network, instead of
the hard labels. Extensive experiments are conducted using several large deep
networks (e.g ResNet-50 and ResNet-110) on CIFAR-10 and ImageNet datasets.
Without bells and whistles, when applied by itself, KD ticket performs on par
or better than rewinding, while being nearly free of hyperparameters or ad-hoc
selection. KD ticket can be further applied together with rewinding, yielding
state-of-the-art results for large-scale lottery tickets.
Related papers
- When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks.
In this work, we propose to discover winning tickets when the pruning process removes layers.
Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z) - Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets
Win [20.97456178983006]
Lottery ticket hypothesis states that sparseworks exist in randomly dense networks that can be trained to the same accuracy as the dense network they reside in.
We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization.
arXiv Detail & Related papers (2021-06-13T10:06:06Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon? [43.47794674403988]
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
arXiv Detail & Related papers (2021-02-19T14:49:46Z) - Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs.
We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.