Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
- URL: http://arxiv.org/abs/2003.01794v3
- Date: Mon, 19 Oct 2020 05:35:07 GMT
- Title: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
- Authors: Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, and Qiang
Liu
- Abstract summary: We propose a simple greedy selection approach for finding goodworks in deep neural networks.
Applying the greedy selection strategy on sufficiently large pre-trained networks guarantees to find smallworks with lower loss than networks directly trained with gradient descent.
- Score: 35.121856435677564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent empirical works show that large deep neural networks are often highly
redundant and one can find much smaller subnetworks without a significant drop
of accuracy. However, most existing methods of network pruning are empirical
and heuristic, leaving it open whether good subnetworks provably exist, how to
find them efficiently, and if network pruning can be provably better than
direct training using gradient descent. We answer these problems positively by
proposing a simple greedy selection approach for finding good subnetworks,
which starts from an empty network and greedily adds important neurons from the
large network. This differs from the existing methods based on backward
elimination, which remove redundant neurons from the large network.
Theoretically, applying the greedy selection strategy on sufficiently large
{pre-trained} networks guarantees to find small subnetworks with lower loss
than networks directly trained with gradient descent. Our results also apply to
pruning randomly weighted networks. Practically, we improve prior arts of
network pruning on learning compact neural architectures on ImageNet, including
ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on
MobileNet suggest that we should fine-tune the pruned subnetworks to leverage
the information from the large model, instead of re-training from new random
initialization as suggested in \citet{liu2018rethinking}.
Related papers
- Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch.
Recombining trained networks is non-trivial because architectures and feature representations typically differ.
We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z) - You Can Have Better Graph Neural Networks by Not Training Weights at
All: Finding Untrained GNNs Tickets [105.24703398193843]
Untrainedworks in graph neural networks (GNNs) still remains mysterious.
We show that the found untrainedworks can substantially mitigate the GNN over-smoothing problem.
We also observe that such sparse untrainedworks have appealing performance in out-of-distribution detection and robustness of input perturbations.
arXiv Detail & Related papers (2022-11-28T14:17:36Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Extracting Effective Subnetworks with Gumebel-Softmax [9.176056742068813]
We devise an alternative pruning method that allows extracting effective pruningworks from larger untrained ones.
Our method is explored and extractsworks by exploring different topologies which are sampled using Gumbel Softmax.
The resultingworks are further enhanced using a highly efficient rescaling mechanism that reduces training time and improves performances.
arXiv Detail & Related papers (2022-02-25T21:31:30Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Greedy Optimization Provably Wins the Lottery: Logarithmic Number of
Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop.
The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate.
Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Network Pruning via Annealing and Direct Sparsity Control [4.976007156860966]
We propose a novel efficient network pruning method that is suitable for both non-structured and structured channel-level pruning.
Our proposed method tightens a sparsity constraint by gradually removing network parameters or filter channels based on a criterion and a schedule.
arXiv Detail & Related papers (2020-02-11T10:51:12Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.