Related papers: Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

URL: http://arxiv.org/abs/2210.03044v1
Date: Thu, 6 Oct 2022 16:50:20 GMT
Title: Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Authors: Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite
Abstract summary: We show that an IMP mask found at the end of training conveys the identity of a desired subspace. We also show that SGD can exploit this information due to a strong form of robustness. Overall, our results make progress toward demystifying the existence of winning tickets.
Score: 40.52143582292875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of training, masking smallest magnitude weights, rewinding back to an early training point, and repeating. Despite its simplicity, the underlying principles for when and how IMP finds winning tickets remain elusive. In particular, what useful information does an IMP mask found at the end of training convey to a rewound network near the beginning of training? How does SGD allow the network to extract this information? And why is iterative pruning needed? We develop answers in terms of the geometry of the error landscape. First, we find that$\unicode{x2014}$at higher sparsities$\unicode{x2014}$pairs of pruned networks at successive pruning iterations are connected by a linear path with zero error barrier if and only if they are matching. This indicates that masks found at the end of training convey the identity of an axial subspace that intersects a desired linearly connected mode of a matching sublevel set. Second, we show SGD can exploit this information due to a strong form of robustness: it can return to this mode despite strong perturbations early in training. Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP. Finally, we show that the role of retraining in IMP is to find a network with new small weights to prune. Overall, these results make progress toward demystifying the existence of winning tickets by revealing the fundamental role of error landscape geometry.

Related papers

Find A Winning Sign: Sign Is All We Need to Win the Lottery [52.63674911541416]
We show that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with normalization layer parameters.
arXiv Detail & Related papers (2025-04-07T09:30:38Z)
No Free Prune: Information-Theoretic Barriers to Pruning at Initialization [8.125999058340998]
We show the Law of Robustness of arXiv:2105.12806 extends to sparse networks with the usual parameter count replaced by $p_texteff$. Experiments on neural networks confirm that information gained during training may indeed affect model capacity.
arXiv Detail & Related papers (2024-02-02T01:13:16Z)
When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks. In this work, we propose to discover winning tickets when the pruning process removes layers. Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks [40.55816472416984]
A striking observation about iterative training (IMP; Frankle et al.) is that $x$ after just a few hundred steps of dense $x2014x2014. In this work, we seek to understand how this early phase of pre-training leads to good IMP for both the data and the network. We identify novel properties of the loss landscape dense networks that are predictive of performance.
arXiv Detail & Related papers (2022-06-02T20:04:06Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets [127.56361320894861]
Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
arXiv Detail & Related papers (2022-02-09T21:33:51Z)
The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets. The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning. We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z)
Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net. Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique. This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.