Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation
- URL: http://arxiv.org/abs/2210.16589v1
- Date: Sat, 29 Oct 2022 12:22:17 GMT
- Title: Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation
- Authors: Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis
- Abstract summary: We extend the theoretical guarantee of the strong Lottery Ticket Hypothesis to a scenario more similar to the original LTH.
By allowing an $varepsilon$-scale perturbation on the random initial weights, can we reduce the over- parameterization requirement for the candidate network in the strong LTH?
We show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.
- Score: 11.38723572165938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The strong Lottery Ticket Hypothesis (LTH) claims the existence of a
subnetwork in a sufficiently large, randomly initialized neural network that
approximates some target neural network without the need of training. We extend
the theoretical guarantee of the strong LTH literature to a scenario more
similar to the original LTH, by generalizing the weight change in the
pre-training step to some perturbation around initialization. In particular, we
focus on the following open questions: By allowing an $\varepsilon$-scale
perturbation on the random initial weights, can we reduce the
over-parameterization requirement for the candidate network in the strong LTH?
Furthermore, does the weight change by SGD coincide with a good set of such
perturbation?
We answer the first question by first extending the theoretical result on
subset sum to allow perturbation on the candidates. Applying this result to the
neural network setting, we show that such $\varepsilon$-perturbation reduces
the over-parameterization requirement of the strong LTH. To answer the second
question, we show via experiments that the perturbed weight achieved by the
projected SGD shows better performance under the strong LTH pruning.
Related papers
- Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - On the Sparsity of the Strong Lottery Ticket Hypothesis [8.47014750905382]
Research efforts have recently been made to show that a random neural network $N$ containsworks capable of accurately approximating any given neural network.
We provide the first proof of the Strong Lottery Ticket Hypothesis in classical settings, with guarantees on the sparsity of theworks.
arXiv Detail & Related papers (2024-10-18T06:57:37Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking
Neural Network [30.924449325020767]
Lottery Ticket Hypothesis (LTH) states that a randomly-d large neural network contains a small sub-network.
LTH opens up a new path for pruning network.
arXiv Detail & Related papers (2023-05-20T09:27:34Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - A General Framework For Proving The Equivariant Strong Lottery Ticket
Hypothesis [15.376680573592997]
Modern neural networks are capable of incorporating more than just translation symmetry.
We generalize the Strong Lottery Ticket Hypothesis (SLTH) to functions that preserve the action of the group $G$.
We prove our theory by overparametrized $textE(2)$-steerable CNNs and message passing GNNs.
arXiv Detail & Related papers (2022-06-09T04:40:18Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization
is Sufficient [9.309655246559094]
We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.
Our analysis relies on connecting pruning random ReLU networks to random instances of the textscSubset problem.
arXiv Detail & Related papers (2020-06-14T19:32:10Z) - Fractional moment-preserving initialization schemes for training deep
neural networks [1.14219428942199]
A traditional approach to deep neural networks (DNNs) is to sample the network weights randomly for preserving the variance of pre-activations.
In this paper, we show that weights and therefore pre-activations can be modeled with a heavy-tailed distribution.
We show through numerical experiments that our schemes can improve the training and test performance.
arXiv Detail & Related papers (2020-05-25T01:10:01Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.