Network Pruning That Matters: A Case Study on Retraining Variants
- URL: http://arxiv.org/abs/2105.03193v1
- Date: Fri, 7 May 2021 12:03:24 GMT
- Title: Network Pruning That Matters: A Case Study on Retraining Variants
- Authors: Duong H. Le, Binh-Son Hua
- Abstract summary: We study the effective of different retraining mechanisms while doing pruning.
We demonstrate a counter-intuitive phenomenon in that randomly pruned networks could even achieve better performance than methodically pruned networks.
- Score: 11.503165599245467
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Network pruning is an effective method to reduce the computational expense of
over-parameterized neural networks for deployment on low-resource systems.
Recent state-of-the-art techniques for retraining pruned networks such as
weight rewinding and learning rate rewinding have been shown to outperform the
traditional fine-tuning technique in recovering the lost accuracy (Renda et
al., 2020), but so far it is unclear what accounts for such performance. In
this work, we conduct extensive experiments to verify and analyze the uncanny
effectiveness of learning rate rewinding. We find that the reason behind the
success of learning rate rewinding is the usage of a large learning rate.
Similar phenomenon can be observed in other learning rate schedules that
involve large learning rates, e.g., the 1-cycle learning rate schedule (Smith
et al., 2019). By leveraging the right learning rate schedule in retraining, we
demonstrate a counter-intuitive phenomenon in that randomly pruned networks
could even achieve better performance than methodically pruned networks
(fine-tuned with the conventional approach). Our results emphasize the
cruciality of the learning rate schedule in pruned network retraining - a
detail often overlooked by practitioners during the implementation of network
pruning. One-sentence Summary: We study the effective of different retraining
mechanisms while doing pruning
Related papers
- Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Understanding the Generalization Benefits of Late Learning Rate Decay [14.471831651042367]
We show the relation between training and testing loss in neural networks.
We introduce a nonlinear model whose loss landscapes mirror those observed for real neural networks.
We demonstrate that an extended phase with a large learning rate steers our model towards the minimum norm solution of the training loss.
arXiv Detail & Related papers (2024-01-21T21:11:09Z) - Effect of Choosing Loss Function when Using T-batching for
Representation Learning on Dynamic Networks [0.0]
T-batching is a valuable technique for training dynamic network models.
We have identified a limitation in the training loss function used with t-batching.
We propose two alternative loss functions that overcome these issues, resulting in enhanced training performance.
arXiv Detail & Related papers (2023-08-13T23:34:36Z) - Stimulative Training++: Go Beyond The Performance Limits of Residual
Networks [91.5381301894899]
Residual networks have shown great success and become indispensable in recent deep neural network models.
Previous research has suggested that residual networks can be considered as ensembles of shallow networks.
We identify a problem that is analogous to social loafing, whereworks within a residual network are prone to exert less effort when working as part of a group compared to working alone.
arXiv Detail & Related papers (2023-05-04T02:38:11Z) - Detachedly Learn a Classifier for Class-Incremental Learning [11.865788374587734]
We present an analysis that the failure of vanilla experience replay (ER) comes from unnecessary re-learning of previous tasks and incompetence to distinguish current task from the previous ones.
We propose a novel replay strategy task-aware experience replay.
Experimental results show our method outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2023-02-23T01:35:44Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity.
We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST)
Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z) - Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z) - Retrospective Loss: Looking Back to Improve Training of Deep Neural
Networks [15.329684157845872]
We introduce a new retrospective loss to improve the training of deep neural network models.
Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state.
Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains.
arXiv Detail & Related papers (2020-06-24T10:16:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.