Related papers: A Gradient Flow Framework For Analyzing Network Pruning

A Gradient Flow Framework For Analyzing Network Pruning

URL: http://arxiv.org/abs/2009.11839v4
Date: Thu, 23 Sep 2021 07:47:56 GMT
Title: A Gradient Flow Framework For Analyzing Network Pruning
Authors: Ekdeep Singh Lubana and Robert P. Dick
Abstract summary: Recent network pruning methods focus on pruning models early-on in training. We develop a general framework that uses gradient flow to unify importance measures through the norm of model parameters. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100.
Score: 11.247894240593693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed to prune trained models. Despite lacking justification for their use early-on in training, such measures result in surprisingly low accuracy loss. To better explain this behavior, we develop a general framework that uses gradient flow to unify state-of-the-art importance measures through the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several results related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is therefore appropriate for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, such that increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100. Code available at https://github.com/EkdeepSLubana/flowandprune.

Related papers

Pruning for Sparse Diffusion Models based on Gradient Flow [5.45577871017303]
Diffusion Models (DMs) have impressive capabilities among generation models, but are limited to slower inference speeds and higher computational costs. Previous works utilize one-shot structure pruning to derive lightweight DMs from pre-trained ones, but this approach often leads to a significant drop in generation quality. We propose a iterative pruning method based on gradient flow, including the gradient flow pruning process and the gradient flow pruning criterion.
arXiv Detail & Related papers (2025-01-16T10:55:05Z)
Federated Topic Model and Model Pruning Based on Variational Autoencoder [14.737942599204064]
Federated topic modeling allows multiple parties to jointly train models while protecting data privacy. This paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model. Experimental results show that the federated topic model pruning can greatly accelerate the model training speed while ensuring the model's performance.
arXiv Detail & Related papers (2023-11-01T06:00:14Z)
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage. This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z)
Structured Model Pruning of Convolutional Networks on Tensor Processing Units [0.0]
Structured model pruning is a promising approach to alleviate these requirements. We measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets. We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy.
arXiv Detail & Related papers (2021-07-09T03:41:31Z)
Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling. We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.