Related papers: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

URL: http://arxiv.org/abs/2003.01794v3
Date: Mon, 19 Oct 2020 05:35:07 GMT
Title: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
Authors: Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, and Qiang Liu
Abstract summary: We propose a simple greedy selection approach for finding goodworks in deep neural networks. Applying the greedy selection strategy on sufficiently large pre-trained networks guarantees to find smallworks with lower loss than networks directly trained with gradient descent.
Score: 35.121856435677564
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying the greedy selection strategy on sufficiently large {pre-trained} networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Our results also apply to pruning randomly weighted networks. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in \citet{liu2018rethinking}.

Related papers

Meta Pruning via Graph Metanetworks : A Meta Learning Framework for Network Pruning [24.505897569096476]
We propose a novel approach in which we "use a neural network to prune neural networks"<n>A metanetwork is a network that takes another network as input and produces a modified network as output.<n>We train a metanetwork that learns the pruning strategy automatically which can transform a network that is hard to prune into another network that is much easier to prune.
arXiv Detail & Related papers (2025-05-24T08:22:34Z)
Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch. Recombining trained networks is non-trivial because architectures and feature representations typically differ. We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z)
You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets [105.24703398193843]
Untrainedworks in graph neural networks (GNNs) still remains mysterious. We show that the found untrainedworks can substantially mitigate the GNN over-smoothing problem. We also observe that such sparse untrainedworks have appealing performance in out-of-distribution detection and robustness of input perturbations.
arXiv Detail & Related papers (2022-11-28T14:17:36Z)
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z)
Extracting Effective Subnetworks with Gumebel-Softmax [9.176056742068813]
We devise an alternative pruning method that allows extracting effective pruningworks from larger untrained ones. Our method is explored and extractsworks by exploring different topologies which are sampled using Gumbel Softmax. The resultingworks are further enhanced using a highly efficient rescaling mechanism that reduces training time and improves performances.
arXiv Detail & Related papers (2022-02-25T21:31:30Z)
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z)
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures. In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z)
Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z)
ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
Network Pruning via Annealing and Direct Sparsity Control [4.976007156860966]
We propose a novel efficient network pruning method that is suitable for both non-structured and structured channel-level pruning. Our proposed method tightens a sparsity constraint by gradually removing network parameters or filter channels based on a criterion and a schedule.
arXiv Detail & Related papers (2020-02-11T10:51:12Z)
A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.