S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based
Networks
- URL: http://arxiv.org/abs/2110.08764v1
- Date: Sun, 17 Oct 2021 08:58:08 GMT
- Title: S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based
Networks
- Authors: Shiyu Liu, Chong Min John Tan, Mehul Motani
- Abstract summary: We find that as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower.
Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc)
S-Cyc adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.
- Score: 37.64233393273063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore a new perspective on adapting the learning rate (LR) schedule to
improve the performance of the ReLU-based network as it is iteratively pruned.
Our work and contribution consist of four parts: (i) We find that, as the
ReLU-based network is iteratively pruned, the distribution of weight gradients
tends to become narrower. This leads to the finding that as the network becomes
more sparse, a larger value of LR should be used to train the pruned network.
(ii) Motivated by this finding, we propose a novel LR schedule, called
S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by
gradually increasing the LR upper bound (max_lr) in an S-shape as the network
is iteratively pruned.We highlight that S-Cyc is a method agnostic LR schedule
that applies to many iterative pruning methods. (iii) We evaluate the
performance of the proposed S-Cyc and compare it to four LR schedule
benchmarks. Our experimental results on three state-of-the-art networks (e.g.,
VGG-19, ResNet-20, ResNet-50) and two popular datasets (e.g., CIFAR-10,
ImageNet-200) demonstrate that S-Cyc consistently outperforms the best
performing benchmark with an improvement of 2.1% - 3.4%, without substantial
increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that
S-Cyc achieves comparable performance to the oracle, which carefully tunes
max_lr via grid search.
Related papers
- Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration [63.37145159948982]
Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets.
Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks.
We propose a self-collaboration (SC) strategy for existing restoration models.
arXiv Detail & Related papers (2024-08-17T16:26:59Z) - Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural
Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks.
In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z) - Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural
Networks [25.84452767219292]
We propose a learning rate (LR) schedule for network pruning called SILO.
SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization.
We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
arXiv Detail & Related papers (2022-12-09T14:39:50Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - A New Backbone for Hyperspectral Image Reconstruction [90.48427561874402]
3D hyperspectral image (HSI) reconstruction refers to inverse process of snapshot compressive imaging.
Proposal is for a Spatial/Spectral Invariant Residual U-Net, namely SSI-ResU-Net.
We show that SSI-ResU-Net achieves competing performance with over 77.3% reduction in terms of floating-point operations.
arXiv Detail & Related papers (2021-08-17T16:20:51Z) - Two-Stage Self-Supervised Cycle-Consistency Network for Reconstruction
of Thin-Slice MR Images [62.4428833931443]
The thick-slice magnetic resonance (MR) images are often structurally blurred in coronal and sagittal views.
Deep learning has shown great potential to re-construct the high-resolution (HR) thin-slice MR images from those low-resolution (LR) cases.
We propose a novel Two-stage Self-supervised Cycle-consistency Network (TSCNet) for MR slice reconstruction.
arXiv Detail & Related papers (2021-06-29T13:29:18Z) - Enabling Retrain-free Deep Neural Network Pruning using Surrogate
Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR)
SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement.
Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks [56.66010634895913]
The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN)
In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks.
We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
arXiv Detail & Related papers (2020-07-29T01:18:58Z) - Towards Understanding Label Smoothing [36.54164997035046]
Label smoothing regularization (LSR) has a great success in deep neural networks by training algorithms.
We show that an appropriate LSR can help to speed up convergence by reducing the variance.
We propose a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA)
arXiv Detail & Related papers (2020-06-20T20:36:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.