The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training
- URL: http://arxiv.org/abs/2202.02643v1
- Date: Sat, 5 Feb 2022 21:19:41 GMT
- Title: The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training
- Authors: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin
Mocanu, Zhangyang Wang, Mykola Pechenizkiy
- Abstract summary: Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
- Score: 111.15069968583042
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Random pruning is arguably the most naive way to attain sparsity in neural
networks, but has been deemed uncompetitive by either post-training pruning or
sparse training. In this paper, we focus on sparse training and highlight a
perhaps counter-intuitive finding, that random pruning at initialization can be
quite powerful for the sparse training of modern neural networks. Without any
delicate pruning criteria or carefully pursued sparsity structures, we
empirically demonstrate that sparsely training a randomly pruned network from
scratch can match the performance of its dense equivalent. There are two key
factors that contribute to this revival: (i) the network sizes matter: as the
original dense networks grow wider and deeper, the performance of training a
randomly pruned sparse network will quickly grow to matching that of its dense
equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity
ratios can be pre-chosen for sparse training, which shows to be another
important performance booster. Simple as it looks, a randomly pruned subnetwork
of Wide ResNet-50 can be sparsely trained to outperforming a dense Wide
ResNet-50, on ImageNet. We also observed such randomly pruned networks
outperform dense counterparts in other favorable aspects, such as
out-of-distribution detection, uncertainty estimation, and adversarial
robustness. Overall, our results strongly suggest there is larger-than-expected
room for sparse training at scale, and the benefits of sparsity might be more
universal beyond carefully designed pruning. Our source code can be found at
https://github.com/VITA-Group/Random_Pruning.
Related papers
- Random Search as a Baseline for Sparse Neural Network Architecture Search [0.0]
Sparse neural networks have shown similar or better performance than their dense counterparts while having higher parameter efficiency.
This has motivated a number of works to learn or search for high performing sparse networks.
We propose Random Search as a baseline algorithm for finding good sparse configurations and study its performance.
We observe that for this sparse architecture search task, sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts.
arXiv Detail & Related papers (2024-03-13T05:32:13Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Why Random Pruning Is All We Need to Start Sparse [7.648170881733381]
Random masks define surprisingly effective sparse neural network models.
We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
arXiv Detail & Related papers (2022-10-05T17:34:04Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - How much pre-training is enough to discover a good subnetwork? [10.699603774240853]
We mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well.
We find a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network.
Experiments with larger datasets require more pre-training forworks obtained via pruning to perform well.
arXiv Detail & Related papers (2021-07-31T15:08:36Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Greedy Optimization Provably Wins the Lottery: Logarithmic Number of
Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop.
The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate.
Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.