The Elastic Lottery Ticket Hypothesis
- URL: http://arxiv.org/abs/2103.16547v1
- Date: Tue, 30 Mar 2021 17:53:45 GMT
- Title: The Elastic Lottery Ticket Hypothesis
- Authors: Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu,
Zhangyang Wang
- Abstract summary: Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
- Score: 106.79387235014379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lottery Ticket Hypothesis raises keen attention to identifying sparse
trainable subnetworks or winning tickets, at the initialization (or early
stage) of training, which can be trained in isolation to achieve similar or
even better performance compared to the full models. Despite many efforts being
made, the most effective method to identify such winning tickets is still
Iterative Magnitude-based Pruning (IMP), which is computationally expensive and
has to be run thoroughly for every different network. A natural question that
comes in is: can we "transform" the winning ticket found in one network to
another with a different architecture, yielding a winning ticket for the latter
at the beginning, without re-doing the expensive IMP? Answering this question
is not only practically relevant for efficient "once-for-all" winning ticket
finding, but also theoretically appealing for uncovering inherently scalable
sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and
ImageNet, and propose a variety of strategies to tweak the winning tickets
found from different networks of the same model family (e.g., ResNets). Based
on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH):
by mindfully replicating (or dropping) and re-ordering layers for one network,
its corresponding winning ticket could be stretched (or squeezed) into a
subnetwork for another deeper (or shallower) network from the same family,
whose performance is nearly as competitive as the latter's winning ticket
directly found by IMP. We have also thoroughly compared E-LTH with
pruning-at-initialization and dynamic sparse training methods, and discuss the
generalizability of E-LTH to different model families, layer types, and even
across datasets. Our codes are publicly available at
https://github.com/VITA-Group/ElasticLTH.
Related papers
- Successfully Applying Lottery Ticket Hypothesis to Diffusion Model [15.910383121581065]
Lottery Ticket Hypothesis claims that there exists winning tickets that can achieve performance competitive to the original dense neural network when trained in isolation.
We empirically findworks at sparsity 90%-99% without compromising performance for denoising diffusion probabilistic models on benchmarks.
Our method can find sparser sub-models that require less memory for storage and reduce the necessary number of FLOPs.
arXiv Detail & Related papers (2023-10-28T21:09:50Z) - Iterative Magnitude Pruning as a Renormalisation Group: A Study in The
Context of The Lottery Ticket Hypothesis [0.0]
This thesis focuses on the Lottery Ticket Hypothesis (LTH)
The LTH posits that within extensive Deep Neural Networks (DNNs), smaller, trainable "winning tickets" can achieve performance comparable to the full model.
A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs.
In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems.
arXiv Detail & Related papers (2023-08-06T14:36:57Z) - When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks.
In this work, we propose to discover winning tickets when the pruning process removes layers.
Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets [127.56361320894861]
Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy.
In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general.
Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
arXiv Detail & Related papers (2022-02-09T21:33:51Z) - Efficient Lottery Ticket Finding: Less Data is More [87.13642800792077]
Lottery ticket hypothesis (LTH) reveals existence of winning tickets (sparse but criticalworks) for dense networks.
Finding winning tickets requires burdensome computations in the train-prune-retrain process.
This paper explores a new perspective on finding lottery tickets more efficiently, by doing so only with a specially selected subset of data.
arXiv Detail & Related papers (2021-06-06T19:58:17Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs.
We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.