Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets
Win
- URL: http://arxiv.org/abs/2106.06955v1
- Date: Sun, 13 Jun 2021 10:06:06 GMT
- Title: Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets
Win
- Authors: Jaron Maene, Mingxiao Li, Marie-Francine Moens
- Abstract summary: Lottery ticket hypothesis states that sparseworks exist in randomly dense networks that can be trained to the same accuracy as the dense network they reside in.
We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization.
- Score: 20.97456178983006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lottery ticket hypothesis states that sparse subnetworks exist in
randomly initialized dense networks that can be trained to the same accuracy as
the dense network they reside in. However, the subsequent work has failed to
replicate this on large-scale models and required rewinding to an early stable
state instead of initialization. We show that by using a training method that
is stable with respect to linear mode connectivity, large networks can also be
entirely rewound to initialization. Our subsequent experiments on common vision
tasks give strong credence to the hypothesis in Evci et al. (2020b) that
lottery tickets simply retrain to the same regions (although not necessarily to
the same basin). These results imply that existing lottery tickets could not
have been found without the preceding dense training by iterative magnitude
pruning, raising doubts about the use of the lottery ticket hypothesis.
Related papers
- When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks.
In this work, we propose to discover winning tickets when the pruning process removes layers.
Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Rare Gems: Finding Lottery Tickets at Initialization [21.130411799740532]
Large neural networks can be pruned to a small fraction of their original size.
Current algorithms for finding trainable networks fail simple baseline comparisons.
Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem.
arXiv Detail & Related papers (2022-02-24T10:28:56Z) - Universality of Deep Neural Network Lottery Tickets: A Renormalization
Group Perspective [89.19516919095904]
Winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures.
We make use of renormalization group theory, one of the most successful tools in theoretical physics.
We leverage here to examine winning ticket universality in large scale lottery ticket experiments, as well as sheds new light on the success iterative magnitude pruning has found in the field of sparse machine learning.
arXiv Detail & Related papers (2021-10-07T06:50:16Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon? [43.47794674403988]
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
arXiv Detail & Related papers (2021-02-19T14:49:46Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs.
We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.