How Sparse Can We Prune A Deep Network: A Fundamental Limit Viewpoint
- URL: http://arxiv.org/abs/2306.05857v2
- Date: Wed, 21 Feb 2024 08:47:06 GMT
- Title: How Sparse Can We Prune A Deep Network: A Fundamental Limit Viewpoint
- Authors: Qiaozhe Zhang, Ruijie Zhang, Jun Sun, Yingzhuang Liu
- Abstract summary: Network pruning is an effective measure to alleviate the storage and computational burden of deep neural networks.
We take a first principles approach, i.e. we impose the sparsity constraint on the original loss function.
We identify two key factors that determine the pruning ratio limit.
- Score: 3.7575861326462845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning is an effective measure to alleviate the storage and
computational burden of deep neural networks arising from its high
overparameterization. Thus raises a fundamental question: How sparse can we
prune a deep network without sacrifice on the performance? To address this
problem, in this work we'll take a first principles approach, i.e. we directly
impose the sparsity constraint on the original loss function and then
characterize the necessary and sufficient condition of the sparsity
(\textit{which turns out to nearly coincide}) by leveraging the notion of
\textit{statistical dimension} in convex geometry. Through this fundamental
limit, we're able to identify two key factors that determine the pruning ratio
limit, i.e., weight magnitude and network flatness. Generally speaking, the
flatter the loss landscape or the smaller the weight magnitude, the smaller
pruning ratio. In addition, we provide efficient countermeasures to address the
challenges in computing the pruning limit, which involves accurate spectrum
estimation of a large-scale and non-positive Hessian matrix. Moreover, through
the lens of the pruning ratio threshold, we can provide rigorous
interpretations on several heuristics in existing pruning algorithms. Extensive
experiments are performed that demonstrate that the our theoretical pruning
ratio threshold coincides very well with the experiments. All codes are
available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
Related papers
- On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Connectivity Matters: Neural Network Pruning Through the Lens of
Effective Sparsity [0.0]
Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes.
We show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart.
We develop a low-cost extension to most pruning algorithms to aim for effective, rather than direct, sparsity.
arXiv Detail & Related papers (2021-07-05T22:36:57Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Greedy Optimization Provably Wins the Lottery: Logarithmic Number of
Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop.
The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate.
Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - On the Predictability of Pruning Across Scales [29.94870276983399]
We show that the error of magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task.
As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
arXiv Detail & Related papers (2020-06-18T15:41:46Z) - Shapley Value as Principled Metric for Structured Network Pruning [10.96182578337852]
Structured pruning is a technique to reduce the storage size and inference cost of neural networks.
We show that reducing the harm caused by pruning becomes crucial to retain the performance of the network.
We propose Shapley values as a principled ranking metric for this task.
arXiv Detail & Related papers (2020-06-02T17:26:49Z) - Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning [83.99191569112682]
Magnitude-based pruning is one of the simplest methods for pruning neural networks.
We develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization.
Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks.
arXiv Detail & Related papers (2020-02-12T05:38:42Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.