Related papers: Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning

Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning

URL: http://arxiv.org/abs/2301.05219v1
Date: Thu, 12 Jan 2023 18:58:33 GMT
Title: Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning
Authors: Huan Wang, Can Qin, Yue Bai, Yun Fu
Abstract summary: The state of neural network pruning has been noticed to be unclear and even confusing for a while. We first clarify the fairness principle in pruning experiments and summarize the widely-used comparison setups. We then point out the central role of network trainability, which has not been well recognized so far.
Score: 58.34310957892895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The state of neural network pruning has been noticed to be unclear and even confusing for a while, largely due to "a lack of standardized benchmarks and metrics" [3]. To standardize benchmarks, first, we need to answer: what kind of comparison setup is considered fair? This basic yet crucial question has barely been clarified in the community, unfortunately. Meanwhile, we observe several papers have used (severely) sub-optimal hyper-parameters in pruning experiments, while the reason behind them is also elusive. These sub-optimal hyper-parameters further exacerbate the distorted benchmarks, rendering the state of neural network pruning even more obscure. Two mysteries in pruning represent such a confusing status: the performance-boosting effect of a larger finetuning learning rate, and the no-value argument of inheriting pretrained weights in filter pruning. In this work, we attempt to explain the confusing state of network pruning by demystifying the two mysteries. Specifically, (1) we first clarify the fairness principle in pruning experiments and summarize the widely-used comparison setups; (2) then we unveil the two pruning mysteries and point out the central role of network trainability, which has not been well recognized so far; (3) finally, we conclude the paper and give some concrete suggestions regarding how to calibrate the pruning benchmarks in the future. Code: https://github.com/mingsun-tse/why-the-state-of-pruning-so-confusing.

Related papers

Pruning vs Quantization: Which is Better? [25.539649458493614]
We provide an extensive comparison between the two techniques for compressing deep neural networks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
arXiv Detail & Related papers (2023-07-06T13:18:44Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Sparse Double Descent: Where Network Pruning Aggravates Overfitting [8.425040193238777]
We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse. We propose a novel learning distance interpretation that the curve of $ell_2$ learning distance of sparse models may correlate with the sparse double descent curve well.
arXiv Detail & Related papers (2022-06-17T11:02:15Z)
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z)
Emerging Paradigms of Neural Network Pruning [82.9322109208353]
Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. Recent works challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one.
arXiv Detail & Related papers (2021-03-11T05:01:52Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
Progressive Skeletonization: Trimming more fat from a network at initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity. We then propose two approximate procedures to maximize our objective. Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)
Shapley Value as Principled Metric for Structured Network Pruning [10.96182578337852]
Structured pruning is a technique to reduce the storage size and inference cost of neural networks. We show that reducing the harm caused by pruning becomes crucial to retain the performance of the network. We propose Shapley values as a principled ranking metric for this task.
arXiv Detail & Related papers (2020-06-02T17:26:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.