Related papers: S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

URL: http://arxiv.org/abs/2110.08764v1
Date: Sun, 17 Oct 2021 08:58:08 GMT
Title: S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks
Authors: Shiyu Liu, Chong Min John Tan, Mehul Motani
Abstract summary: We find that as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower. Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) S-Cyc adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.
Score: 37.64233393273063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore a new perspective on adapting the learning rate (LR) schedule to improve the performance of the ReLU-based network as it is iteratively pruned. Our work and contribution consist of four parts: (i) We find that, as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower. This leads to the finding that as the network becomes more sparse, a larger value of LR should be used to train the pruned network. (ii) Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.We highlight that S-Cyc is a method agnostic LR schedule that applies to many iterative pruning methods. (iii) We evaluate the performance of the proposed S-Cyc and compare it to four LR schedule benchmarks. Our experimental results on three state-of-the-art networks (e.g., VGG-19, ResNet-20, ResNet-50) and two popular datasets (e.g., CIFAR-10, ImageNet-200) demonstrate that S-Cyc consistently outperforms the best performing benchmark with an improvement of 2.1% - 3.4%, without substantial increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that S-Cyc achieves comparable performance to the oracle, which carefully tunes max_lr via grid search.

Related papers

Structuring Multiple Simple Cycle Reservoirs with Particle Swarm Optimization [4.452666723220885]
Reservoir Computing (RC) is a time-efficient computational paradigm derived from Recurrent Neural Networks (RNNs) This paper introduces Multiple Simple Cycle Reservoirs (MSCRs), a multi-reservoir framework that extends Echo State Networks (ESNs) We demonstrate that optimizing MSCR using Particle Swarm Optimization (PSO) outperforms existing multi-reservoir models, achieving competitive predictive performance with a lower-dimensional state space.
arXiv Detail & Related papers (2025-04-06T12:25:40Z)
Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration [63.37145159948982]
Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks. We propose a self-collaboration (SC) strategy for existing restoration models.
arXiv Detail & Related papers (2024-08-17T16:26:59Z)
Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z)
Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks [25.84452767219292]
We propose a learning rate (LR) schedule for network pruning called SILO. SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
arXiv Detail & Related papers (2022-12-09T14:39:50Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
A New Backbone for Hyperspectral Image Reconstruction [90.48427561874402]
3D hyperspectral image (HSI) reconstruction refers to inverse process of snapshot compressive imaging. Proposal is for a Spatial/Spectral Invariant Residual U-Net, namely SSI-ResU-Net. We show that SSI-ResU-Net achieves competing performance with over 77.3% reduction in terms of floating-point operations.
arXiv Detail & Related papers (2021-08-17T16:20:51Z)
Two-Stage Self-Supervised Cycle-Consistency Network for Reconstruction of Thin-Slice MR Images [62.4428833931443]
The thick-slice magnetic resonance (MR) images are often structurally blurred in coronal and sagittal views. Deep learning has shown great potential to re-construct the high-resolution (HR) thin-slice MR images from those low-resolution (LR) cases. We propose a novel Two-stage Self-supervised Cycle-consistency Network (TSCNet) for MR slice reconstruction.
arXiv Detail & Related papers (2021-06-29T13:29:18Z)
Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR) SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement. Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z)
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks [56.66010634895913]
The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN) In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks. We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
arXiv Detail & Related papers (2020-07-29T01:18:58Z)
Towards Understanding Label Smoothing [36.54164997035046]
Label smoothing regularization (LSR) has a great success in deep neural networks by training algorithms. We show that an appropriate LSR can help to speed up convergence by reducing the variance. We propose a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA)
arXiv Detail & Related papers (2020-06-20T20:36:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.