On the Sparsity of the Strong Lottery Ticket Hypothesis
- URL: http://arxiv.org/abs/2410.14754v2
- Date: Thu, 31 Oct 2024 07:50:22 GMT
- Title: On the Sparsity of the Strong Lottery Ticket Hypothesis
- Authors: Emanuele Natale, Davide Ferre', Giordano Giambartolomei, Frédéric Giroire, Frederik Mallmann-Trenn,
- Abstract summary: Research efforts have recently been made to show that a random neural network $N$ containsworks capable of accurately approximating any given neural network.
We provide the first proof of the Strong Lottery Ticket Hypothesis in classical settings, with guarantees on the sparsity of theworks.
- Score: 8.47014750905382
- License:
- Abstract: Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains \emph{sparse} subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$. Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks. Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem. Informally, the RSS Problem asks how large a random i.i.d. sample $\Omega$ should be so that we are able to approximate any number in $[-1,1]$, up to an error of $ \epsilon$, as the sum of a suitable subset of $\Omega$. We provide the first proof of the SLTH in classical settings, such as dense and equivariant networks, with guarantees on the sparsity of the subnetworks. Central to our results, is the proof of an essentially tight bound on the Random Fixed-Size Subset Sum Problem (RFSS), a variant of the RSS Problem in which we only ask for subsets of a given size, which is of independent interest.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs [42.551773746803946]
Vision tasks are characterized by the properties of locality and translation invariance.
The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture.
Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories.
arXiv Detail & Related papers (2024-03-23T03:57:28Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation [11.38723572165938]
We extend the theoretical guarantee of the strong Lottery Ticket Hypothesis to a scenario more similar to the original LTH.
By allowing an $varepsilon$-scale perturbation on the random initial weights, can we reduce the over- parameterization requirement for the candidate network in the strong LTH?
We show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.
arXiv Detail & Related papers (2022-10-29T12:22:17Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Distributed Sparse Feature Selection in Communication-Restricted
Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection.
In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z) - Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization
is Sufficient [9.309655246559094]
We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.
Our analysis relies on connecting pruning random ReLU networks to random instances of the textscSubset problem.
arXiv Detail & Related papers (2020-06-14T19:32:10Z) - Proving the Lottery Ticket Hypothesis: Pruning is All You Need [56.25432563818297]
The lottery ticket hypothesis states that a randomly-d network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.
We prove an even stronger hypothesis, showing that for every bounded distribution and every target network with bounded weights, a sufficiently over- parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training.
arXiv Detail & Related papers (2020-02-03T07:23:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.