Exploring Weight Importance and Hessian Bias in Model Pruning
- URL: http://arxiv.org/abs/2006.10903v1
- Date: Fri, 19 Jun 2020 00:15:55 GMT
- Title: Exploring Weight Importance and Hessian Bias in Model Pruning
- Authors: Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak
- Abstract summary: We provide a principled exploration of pruning by building on a natural notion of importance.
For linear models, we show that this notion of importance is captured by scaling which connects to the well-known Hessian-based pruning algorithm.
We identify settings in which weights become more important despite becoming smaller, which in turn leads to a catastrophic failure of magnitude-based pruning.
- Score: 55.75546858514194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model pruning is an essential procedure for building compact and
computationally-efficient machine learning models. A key feature of a good
pruning algorithm is that it accurately quantifies the relative importance of
the model weights. While model pruning has a rich history, we still don't have
a full grasp of the pruning mechanics even for relatively simple problems
involving linear models or shallow neural nets. In this work, we provide a
principled exploration of pruning by building on a natural notion of
importance. For linear models, we show that this notion of importance is
captured by covariance scaling which connects to the well-known Hessian-based
pruning. We then derive asymptotic formulas that allow us to precisely compare
the performance of different pruning methods. For neural networks, we
demonstrate that the importance can be at odds with larger magnitudes and
proper initialization is critical for magnitude-based pruning. Specifically, we
identify settings in which weights become more important despite becoming
smaller, which in turn leads to a catastrophic failure of magnitude-based
pruning. Our results also elucidate that implicit regularization in the form of
Hessian structure has a catalytic role in identifying the important weights,
which dictate the pruning performance.
Related papers
- Isomorphic Pruning for Vision Models [56.286064975443026]
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures.
We present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures.
arXiv Detail & Related papers (2024-07-05T16:14:53Z) - Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration [1.642094639107215]
The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience.
This problem is called model calibration and has been studied extensively.
We show that focal loss reduces the curvature of the loss surface in training the model.
arXiv Detail & Related papers (2024-05-01T10:53:54Z) - Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning.
We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.