Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls
for Network Pruning
- URL: http://arxiv.org/abs/2103.10629v1
- Date: Fri, 19 Mar 2021 04:41:40 GMT
- Title: Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls
for Network Pruning
- Authors: Kambiz Azarian and Fatih Porikli
- Abstract summary: We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning.
We demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity.
We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning.
- Score: 73.79377854107514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We report, for the first time, on the cascade weight shedding phenomenon in
deep neural networks where in response to pruning a small percentage of a
network's weights, a large percentage of the remaining is shed over a few
epochs during the ensuing fine-tuning phase. We show that cascade weight
shedding, when present, can significantly improve the performance of an
otherwise sub-optimal scheme such as random pruning. This explains why some
pruning methods may perform well under certain circumstances, but poorly under
others, e.g., ResNet50 vs. MobileNetV3. We provide insight into why the global
magnitude-based pruning, i.e., GMP, despite its simplicity, provides a
competitive performance for a wide range of scenarios. We also demonstrate
cascade weight shedding's potential for improving GMP's accuracy, and reduce
its computational complexity. In doing so, we highlight the importance of
pruning and learning-rate schedules. We shed light on weight and learning-rate
rewinding methods of re-training, showing their possible connections to the
cascade weight shedding and reason for their advantage over fine-tuning. We
also investigate cascade weight shedding's effect on the set of kept weights,
and its implications for semi-structured pruning. Finally, we give directions
for future research.
Related papers
- Weights Augmentation: it has never ever ever ever let her model down [1.5020330976600735]
This article proposes the concept of weight augmentation, focusing on weight exploration.
Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed, named Shadow Weight(SW), for networks that can be used to calculate loss function.
Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost.
arXiv Detail & Related papers (2024-05-30T00:57:06Z) - To prune or not to prune : A chaos-causality approach to principled
pruning of dense neural networks [1.9249287163937978]
We introduce the concept of chaos in learning (Lyapunov exponents) via weight updates and exploiting causality to identify the causal weights responsible for misclassification.
Such a pruned network maintains the original performance and retains feature explainability.
arXiv Detail & Related papers (2023-08-19T09:17:33Z) - WeightMom: Learning Sparse Networks using Iterative Momentum-based
pruning [0.0]
We propose a weight based pruning approach in which the weights are pruned gradually based on their momentum of the previous iterations.
We evaluate our approach on networks such as AlexNet, VGG16 and ResNet50 with image classification datasets such as CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-08-11T07:13:59Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Why is Pruning at Initialization Immune to Reinitializing and Shuffling? [10.196185472801236]
Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding.
Under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
arXiv Detail & Related papers (2021-07-05T06:04:56Z) - Pruning Randomly Initialized Neural Networks with Iterative
Randomization [7.676965708017808]
We introduce a novel framework to prune randomly neural networks with iteratively randomizing weight values (IteRand)
Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters.
arXiv Detail & Related papers (2021-06-17T06:32:57Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.