Related papers: Why Random Pruning Is All We Need to Start Sparse

Why Random Pruning Is All We Need to Start Sparse

URL: http://arxiv.org/abs/2210.02412v2
Date: Wed, 31 May 2023 15:51:16 GMT
Title: Why Random Pruning Is All We Need to Start Sparse
Authors: Advait Gadhikar, Sohom Mukherjee and Rebekka Burkholz
Abstract summary: Random masks define surprisingly effective sparse neural network models. We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
Score: 7.648170881733381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting sparse networks can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms, even though they do not rely on computationally expensive prune-train iterations and can be drawn initially without significant computational overhead. We offer a theoretical explanation of how random masks can approximate arbitrary target networks if they are wider by a logarithmic factor in the inverse sparsity $1 / \log(1/\text{sparsity})$. This overparameterization factor is necessary at least for 3-layer random networks, which elucidates the observed degrading performance of random networks at higher sparsity. At moderate to high sparsity levels, however, our results imply that sparser networks are contained within random source networks so that any dense-to-sparse training scheme can be turned into a computationally more efficient sparse-to-sparse one by constraining the search to a fixed random mask. We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network. As a special case, we show theoretically and experimentally that random source networks also contain strong lottery tickets.

Related papers

Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks [10.48836159692231]
We propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask. We show that the proposed method can improve the performance of state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-19T18:05:13Z)
Random Search as a Baseline for Sparse Neural Network Architecture Search [0.0]
Sparse neural networks have shown similar or better performance than their dense counterparts while having higher parameter efficiency. This has motivated a number of works to learn or search for high performing sparse networks. We propose Random Search as a baseline algorithm for finding good sparse configurations and study its performance. We observe that for this sparse architecture search task, sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts.
arXiv Detail & Related papers (2024-03-13T05:32:13Z)
Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework. Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z)
Likelihood-Free Inference with Generative Neural Networks via Scoring Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood. Many works trained neural networks to approximate either the intractable likelihood or the posterior directly. Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph. Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards. We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z)
How Powerful are Shallow Neural Networks with Bandlimited Random Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z)
ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.