A Gradient Flow Framework For Analyzing Network Pruning
- URL: http://arxiv.org/abs/2009.11839v4
- Date: Thu, 23 Sep 2021 07:47:56 GMT
- Title: A Gradient Flow Framework For Analyzing Network Pruning
- Authors: Ekdeep Singh Lubana and Robert P. Dick
- Abstract summary: Recent network pruning methods focus on pruning models early-on in training.
We develop a general framework that uses gradient flow to unify importance measures through the norm of model parameters.
We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100.
- Score: 11.247894240593693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent network pruning methods focus on pruning models early-on in training.
To estimate the impact of removing a parameter, these methods use importance
measures that were originally designed to prune trained models. Despite lacking
justification for their use early-on in training, such measures result in
surprisingly low accuracy loss. To better explain this behavior, we develop a
general framework that uses gradient flow to unify state-of-the-art importance
measures through the norm of model parameters. We use this framework to
determine the relationship between pruning measures and evolution of model
parameters, establishing several results related to pruning models early-on in
training: (i) magnitude-based pruning removes parameters that contribute least
to reduction in loss, resulting in models that converge faster than
magnitude-agnostic methods; (ii) loss-preservation based pruning preserves
first-order model evolution dynamics and is therefore appropriate for pruning
minimally trained models; and (iii) gradient-norm based pruning affects
second-order model evolution dynamics, such that increasing gradient norm via
pruning can produce poorly performing models. We validate our claims on several
VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100. Code
available at https://github.com/EkdeepSLubana/flowandprune.
Related papers
- Federated Topic Model and Model Pruning Based on Variational Autoencoder [14.737942599204064]
Federated topic modeling allows multiple parties to jointly train models while protecting data privacy.
This paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model.
Experimental results show that the federated topic model pruning can greatly accelerate the model training speed while ensuring the model's performance.
arXiv Detail & Related papers (2023-11-01T06:00:14Z) - PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage.
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z) - Structured Model Pruning of Convolutional Networks on Tensor Processing
Units [0.0]
Structured model pruning is a promising approach to alleviate these requirements.
We measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets.
We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy.
arXiv Detail & Related papers (2021-07-09T03:41:31Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning.
We propose the use of movement pruning, a simple, deterministic first-order weight pruning method.
Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.