Efficient Lottery Ticket Finding: Less Data is More
- URL: http://arxiv.org/abs/2106.03225v1
- Date: Sun, 6 Jun 2021 19:58:17 GMT
- Title: Efficient Lottery Ticket Finding: Less Data is More
- Authors: Zhenyu Zhang, Xuxi Chen, Tianlong Chen, Zhangyang Wang
- Abstract summary: Lottery ticket hypothesis (LTH) reveals existence of winning tickets (sparse but criticalworks) for dense networks.
Finding winning tickets requires burdensome computations in the train-prune-retrain process.
This paper explores a new perspective on finding lottery tickets more efficiently, by doing so only with a specially selected subset of data.
- Score: 87.13642800792077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lottery ticket hypothesis (LTH) reveals the existence of winning tickets
(sparse but critical subnetworks) for dense networks, that can be trained in
isolation from random initialization to match the latter's accuracies. However,
finding winning tickets requires burdensome computations in the
train-prune-retrain process, especially on large-scale datasets (e.g.,
ImageNet), restricting their practical benefits. This paper explores a new
perspective on finding lottery tickets more efficiently, by doing so only with
a specially selected subset of data, called Pruning-Aware Critical set (PrAC
set), rather than using the full training set. The concept of PrAC set was
inspired by the recent observation, that deep networks have samples that are
either hard to memorize during training, or easy to forget during pruning. A
PrAC set is thus hypothesized to capture those most challenging and informative
examples for the dense model. We observe that a high-quality winning ticket can
be found with training and pruning the dense network on the very compact PrAC
set, which can substantially save training iterations for the ticket finding
process. Extensive experiments validate our proposal across diverse datasets
and network architectures. Specifically, on CIFAR-10, CIFAR-100, and Tiny
ImageNet, we locate effective PrAC sets at 35.32%~78.19% of their training set
sizes. On top of them, we can obtain the same competitive winning tickets for
the corresponding dense networks, yet saving up to 82.85%~92.77%,
63.54%~74.92%, and 76.14%~86.56% training iterations, respectively. Crucially,
we show that a PrAC set found is reusable across different network
architectures, which can amortize the extra cost of finding PrAC sets, yielding
a practical regime for efficient lottery ticket finding.
Related papers
- Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural
Networks by Pruning A Randomly Weighted Network [13.193734014710582]
We propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets.
Our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively.
arXiv Detail & Related papers (2021-03-17T00:31:24Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process.
As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.