Why Random Pruning Is All We Need to Start Sparse
- URL: http://arxiv.org/abs/2210.02412v2
- Date: Wed, 31 May 2023 15:51:16 GMT
- Title: Why Random Pruning Is All We Need to Start Sparse
- Authors: Advait Gadhikar, Sohom Mukherjee and Rebekka Burkholz
- Abstract summary: Random masks define surprisingly effective sparse neural network models.
We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
- Score: 7.648170881733381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random masks define surprisingly effective sparse neural network models, as
has been shown empirically. The resulting sparse networks can often compete
with dense architectures and state-of-the-art lottery ticket pruning
algorithms, even though they do not rely on computationally expensive
prune-train iterations and can be drawn initially without significant
computational overhead. We offer a theoretical explanation of how random masks
can approximate arbitrary target networks if they are wider by a logarithmic
factor in the inverse sparsity $1 / \log(1/\text{sparsity})$. This
overparameterization factor is necessary at least for 3-layer random networks,
which elucidates the observed degrading performance of random networks at
higher sparsity. At moderate to high sparsity levels, however, our results
imply that sparser networks are contained within random source networks so that
any dense-to-sparse training scheme can be turned into a computationally more
efficient sparse-to-sparse one by constraining the search to a fixed random
mask. We demonstrate the feasibility of this approach in experiments for
different pruning methods and propose particularly effective choices of initial
layer-wise sparsity ratios of the random source network. As a special case, we
show theoretically and experimentally that random source networks also contain
strong lottery tickets.
Related papers
- Random Search as a Baseline for Sparse Neural Network Architecture Search [0.0]
Sparse neural networks have shown similar or better performance than their dense counterparts while having higher parameter efficiency.
This has motivated a number of works to learn or search for high performing sparse networks.
We propose Random Search as a baseline algorithm for finding good sparse configurations and study its performance.
We observe that for this sparse architecture search task, sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts.
arXiv Detail & Related papers (2024-03-13T05:32:13Z) - Computational Complexity of Learning Neural Networks: Smoothness and
Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework.
Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z) - Likelihood-Free Inference with Generative Neural Networks via Scoring
Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood.
Many works trained neural networks to approximate either the intractable likelihood or the posterior directly.
Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph.
Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards.
We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z) - How Powerful are Shallow Neural Networks with Bandlimited Random
Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks.
A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.