Iterative Magnitude Pruning as a Renormalisation Group: A Study in The
Context of The Lottery Ticket Hypothesis
- URL: http://arxiv.org/abs/2308.03128v1
- Date: Sun, 6 Aug 2023 14:36:57 GMT
- Title: Iterative Magnitude Pruning as a Renormalisation Group: A Study in The
Context of The Lottery Ticket Hypothesis
- Authors: Abu-Al Hassan
- Abstract summary: This thesis focuses on the Lottery Ticket Hypothesis (LTH)
The LTH posits that within extensive Deep Neural Networks (DNNs), smaller, trainable "winning tickets" can achieve performance comparable to the full model.
A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs.
In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This thesis delves into the intricate world of Deep Neural Networks (DNNs),
focusing on the exciting concept of the Lottery Ticket Hypothesis (LTH). The
LTH posits that within extensive DNNs, smaller, trainable subnetworks termed
"winning tickets", can achieve performance comparable to the full model. A key
process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates
minimal weights, emulating stepwise learning in DNNs. Once we identify these
winning tickets, we further investigate their "universality". In other words,
we check if a winning ticket that works well for one specific problem could
also work well for other, similar problems. We also bridge the divide between
the IMP and the Renormalisation Group (RG) theory in physics, promoting a more
rigorous understanding of IMP.
Related papers
- When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks.
In this work, we propose to discover winning tickets when the pruning process removes layers.
Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z) - The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural
Network [69.54809052377189]
Recently many plug-and-play self-attention modules (SAMs) are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs)
We empirically find and verify some counterintuitive phenomena that: (a) Connecting the SAMs to all the blocks may not always bring the largest performance boost, and connecting to partial blocks would be even better; (b) Adding the SAMs to a CNN may not always bring a performance boost, and instead it may even harm the performance of the original CNN backbone.
arXiv Detail & Related papers (2022-07-16T07:08:59Z) - Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective [25.157282476221482]
We show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior.
We offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets.
arXiv Detail & Related papers (2022-05-15T15:58:27Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets [127.56361320894861]
Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy.
In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general.
Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
arXiv Detail & Related papers (2022-02-09T21:33:51Z) - Universality of Deep Neural Network Lottery Tickets: A Renormalization
Group Perspective [89.19516919095904]
Winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures.
We make use of renormalization group theory, one of the most successful tools in theoretical physics.
We leverage here to examine winning ticket universality in large scale lottery ticket experiments, as well as sheds new light on the success iterative magnitude pruning has found in the field of sparse machine learning.
arXiv Detail & Related papers (2021-10-07T06:50:16Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon? [43.47794674403988]
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
arXiv Detail & Related papers (2021-02-19T14:49:46Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.