Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural
Networks
- URL: http://arxiv.org/abs/2212.06144v1
- Date: Fri, 9 Dec 2022 14:39:50 GMT
- Title: Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural
Networks
- Authors: Shiyu Liu, Rohan Ghosh, John Tan Chong Min, Mehul Motani
- Abstract summary: We propose a learning rate (LR) schedule for network pruning called SILO.
SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization.
We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
- Score: 25.84452767219292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The importance of learning rate (LR) schedules on network pruning has been
observed in a few recent works. As an example, Frankle and Carbin (2019)
highlighted that winning tickets (i.e., accuracy preserving subnetworks) can
not be found without applying a LR warmup schedule and Renda, Frankle and
Carbin (2020) demonstrated that rewinding the LR to its initial state at the
end of each pruning cycle improves performance. In this paper, we go one step
further by first providing a theoretical justification for the surprising
effect of LR schedules. Next, we propose a LR schedule for network pruning
called SILO, which stands for S-shaped Improved Learning rate Optimization. The
advantages of SILO over existing state-of-the-art (SOTA) LR schedules are
two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts
the LR during pruning to improve generalization. Specifically, SILO increases
the LR upper bound (max_lr) in an S-shape. This leads to an improvement of 2% -
4% in extensive experiments with various types of networks (e.g., Vision
Transformers, ResNet) on popular datasets such as ImageNet, CIFAR-10/100. (ii)
In addition to the strong theoretical motivation, SILO is empirically optimal
in the sense of matching an Oracle, which exhaustively searches for the optimal
value of max_lr via grid search. We find that SILO is able to precisely adjust
the value of max_lr to be within the Oracle optimized interval, resulting in
performance competitive with the Oracle with significantly lower complexity.
Related papers
- Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.
This paper presents the RunLoRA framework for efficient implementations of LoRA.
Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based
Networks [37.64233393273063]
We find that as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower.
Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc)
S-Cyc adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.
arXiv Detail & Related papers (2021-10-17T08:58:08Z) - MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks [56.66010634895913]
The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN)
In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks.
We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
arXiv Detail & Related papers (2020-07-29T01:18:58Z) - Towards Understanding Label Smoothing [36.54164997035046]
Label smoothing regularization (LSR) has a great success in deep neural networks by training algorithms.
We show that an appropriate LSR can help to speed up convergence by reducing the variance.
We propose a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA)
arXiv Detail & Related papers (2020-06-20T20:36:17Z) - Iterative Network for Image Super-Resolution [69.07361550998318]
Single image super-resolution (SISR) has been greatly revitalized by the recent development of convolutional neural networks (CNN)
This paper provides a new insight on conventional SISR algorithm, and proposes a substantially different approach relying on the iterative optimization.
A novel iterative super-resolution network (ISRN) is proposed on top of the iterative optimization.
arXiv Detail & Related papers (2020-05-20T11:11:47Z) - kDecay: Just adding k-decay items on Learning-Rate Schedule to improve
Neural Networks [5.541389959719384]
k-decay is effectively improves the performance of commonly used and easy LR schedule.
We evaluate the k-decay method on CIFAR And ImageNet datasets with different neural networks.
The accuracy has been improved by 1.08% on the CIFAR-10 dataset and by 2.07% on the CIFAR-100 dataset.
arXiv Detail & Related papers (2020-04-13T12:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.