How Sparse Can We Prune A Deep Network: A Fundamental Limit Viewpoint
- URL: http://arxiv.org/abs/2306.05857v2
- Date: Wed, 21 Feb 2024 08:47:06 GMT
- Title: How Sparse Can We Prune A Deep Network: A Fundamental Limit Viewpoint
- Authors: Qiaozhe Zhang, Ruijie Zhang, Jun Sun, Yingzhuang Liu
- Abstract summary: Network pruning is an effective measure to alleviate the storage and computational burden of deep neural networks.
We take a first principles approach, i.e. we impose the sparsity constraint on the original loss function.
We identify two key factors that determine the pruning ratio limit.
- Score: 3.7575861326462845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning is an effective measure to alleviate the storage and
computational burden of deep neural networks arising from its high
overparameterization. Thus raises a fundamental question: How sparse can we
prune a deep network without sacrifice on the performance? To address this
problem, in this work we'll take a first principles approach, i.e. we directly
impose the sparsity constraint on the original loss function and then
characterize the necessary and sufficient condition of the sparsity
(\textit{which turns out to nearly coincide}) by leveraging the notion of
\textit{statistical dimension} in convex geometry. Through this fundamental
limit, we're able to identify two key factors that determine the pruning ratio
limit, i.e., weight magnitude and network flatness. Generally speaking, the
flatter the loss landscape or the smaller the weight magnitude, the smaller
pruning ratio. In addition, we provide efficient countermeasures to address the
challenges in computing the pruning limit, which involves accurate spectrum
estimation of a large-scale and non-positive Hessian matrix. Moreover, through
the lens of the pruning ratio threshold, we can provide rigorous
interpretations on several heuristics in existing pruning algorithms. Extensive
experiments are performed that demonstrate that the our theoretical pruning
ratio threshold coincides very well with the experiments. All codes are
available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
Related papers
- Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression [4.297070083645049]
We develop tight (up to a multiplicative constant) lower and upper bounds on the covering numbers of fully-connected networks.
Thanks to the tightness of the bounds, a fundamental understanding of the impact of sparsity, quantization, bounded vs. unbounded weights, and network output truncation can be developed.
arXiv Detail & Related papers (2024-10-08T21:23:14Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
Limit at Initialization [18.613475245655806]
We study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant.
Using Monte Carlo simulations, we demonstrate that even basic properties of standard ResNet architectures are poorly captured by the Gaussian limit.
arXiv Detail & Related papers (2021-06-07T23:47:37Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - On the Predictability of Pruning Across Scales [29.94870276983399]
We show that the error of magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task.
As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
arXiv Detail & Related papers (2020-06-18T15:41:46Z) - Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning [83.99191569112682]
Magnitude-based pruning is one of the simplest methods for pruning neural networks.
We develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization.
Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks.
arXiv Detail & Related papers (2020-02-12T05:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.