Find A Winning Sign: Sign Is All We Need to Win the Lottery
- URL: http://arxiv.org/abs/2504.05357v1
- Date: Mon, 07 Apr 2025 09:30:38 GMT
- Title: Find A Winning Sign: Sign Is All We Need to Win the Lottery
- Authors: Junghun Oh, Sungyong Baik, Kyoung Mu Lee,
- Abstract summary: We show that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved.<n>To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with normalization layer parameters.
- Score: 52.63674911541416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Lottery Ticket Hypothesis (LTH) posits the existence of a sparse subnetwork (a.k.a. winning ticket) that can generalize comparably to its over-parameterized counterpart when trained from scratch. The common approach to finding a winning ticket is to preserve the original strong generalization through Iterative Pruning (IP) and transfer information useful for achieving the learned generalization by applying the resulting sparse mask to an untrained network. However, existing IP methods still struggle to generalize their observations beyond ad-hoc initialization and small-scale architectures or datasets, or they bypass these challenges by applying their mask to trained weights instead of initialized ones. In this paper, we demonstrate that the parameter sign configuration plays a crucial role in conveying useful information for generalization to any randomly initialized network. Through linear mode connectivity analysis, we observe that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with initialized normalization layer parameters. Interestingly, across various architectures and datasets, we observe that any randomly initialized network can be optimized to exhibit low error barriers along the linear path to the sparse network trained by our method by inheriting its sparsity and parameter sign information, potentially achieving performance comparable to the original. The code is available at https://github.com/JungHunOh/AWS\_ICLR2025.git
Related papers
- Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks [10.48836159692231]
We propose a novel class of methods to play the lottery.<n>The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask.<n>We show that the proposed method can improve the performance of state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-19T18:05:13Z) - Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization [49.06421851486415]
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years.
We propose Exact Orthogonal Initialization (EOI), a novel sparse Orthogonal Initialization scheme based on random Givens rotations.
Our method enables training highly sparse 1000-layer and CNN networks without residual connections or normalization techniques.
arXiv Detail & Related papers (2024-06-03T19:44:47Z) - Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning
Ticket's Mask? [40.52143582292875]
We show that an IMP mask found at the end of training conveys the identity of a desired subspace.
We also show that SGD can exploit this information due to a strong form of robustness.
Overall, our results make progress toward demystifying the existence of winning tickets.
arXiv Detail & Related papers (2022-10-06T16:50:20Z) - Sparse tree-based initialization for neural networks [0.0]
We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text.
In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP)
We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
arXiv Detail & Related papers (2022-09-30T07:44:03Z) - Lottery Tickets on a Data Diet: Finding Initializations with Sparse
Trainable Networks [40.55816472416984]
A striking observation about iterative training (IMP; Frankle et al.) is that $x$ after just a few hundred steps of dense $x2014x2014.
In this work, we seek to understand how this early phase of pre-training leads to good IMP for both the data and the network.
We identify novel properties of the loss landscape dense networks that are predictive of performance.
arXiv Detail & Related papers (2022-06-02T20:04:06Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.