Dual Lottery Ticket Hypothesis
- URL: http://arxiv.org/abs/2203.04248v1
- Date: Tue, 8 Mar 2022 18:06:26 GMT
- Title: Dual Lottery Ticket Hypothesis
- Authors: Yue Bai, Huan Wang, Zhiqiang Tao, Kunpeng Li, Yun Fu
- Abstract summary: Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
- Score: 71.95937879869334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully exploiting the learning capacity of neural networks requires
overparameterized dense networks. On the other side, directly training sparse
neural networks typically results in unsatisfactory performance. Lottery Ticket
Hypothesis (LTH) provides a novel view to investigate sparse network training
and maintain its capacity. Concretely, it claims there exist winning tickets
from a randomly initialized network found by iterative magnitude pruning and
preserving promising trainability (or we say being in trainable condition). In
this work, we regard the winning ticket from LTH as the subnetwork which is in
trainable condition and its performance as our benchmark, then go from a
complementary direction to articulate the Dual Lottery Ticket Hypothesis
(DLTH): Randomly selected subnetworks from a randomly initialized dense network
can be transformed into a trainable condition and achieve admirable performance
compared with LTH -- random tickets in a given lottery pool can be transformed
into winning tickets. Specifically, by using uniform-randomly selected
subnetworks to represent the general cases, we propose a simple sparse network
training strategy, Random Sparse Network Transformation (RST), to substantiate
our DLTH. Concretely, we introduce a regularization term to borrow learning
capacity and realize information extrusion from the weights which will be
masked. After finishing the transformation for the randomly selected
subnetworks, we conduct the regular finetuning to evaluate the model using fair
comparisons with LTH and other strong baselines. Extensive experiments on
several public datasets and comparisons with competitive approaches validate
our DLTH as well as the effectiveness of the proposed model RST. Our work is
expected to pave a way for inspiring new research directions of sparse network
training in the future. Our code is available at
https://github.com/yueb17/DLTH.
Related papers
- Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon? [43.47794674403988]
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
arXiv Detail & Related papers (2021-02-19T14:49:46Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.