Enabling Retrain-free Deep Neural Network Pruning using Surrogate
Lagrangian Relaxation
- URL: http://arxiv.org/abs/2012.10079v2
- Date: Thu, 25 Mar 2021 04:50:49 GMT
- Title: Enabling Retrain-free Deep Neural Network Pruning using Surrogate
Lagrangian Relaxation
- Authors: Deniz Gurevin, Shanglin Zhou, Lynn Pepin, Bingbing Li, Mikhail Bragin,
Caiwen Ding, Fei Miao
- Abstract summary: We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR)
SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement.
Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
- Score: 2.691929135895278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning is a widely used technique to reduce computation cost and
model size for deep neural networks. However, the typical three-stage pipeline,
i.e., training, pruning and retraining (fine-tuning) significantly increases
the overall training trails. In this paper, we develop a systematic
weight-pruning optimization approach based on Surrogate Lagrangian relaxation
(SLR), which is tailored to overcome difficulties caused by the discrete nature
of the weight-pruning problem while ensuring fast convergence. We further
accelerate the convergence of the SLR by using quadratic penalties. Model
parameters obtained by SLR during the training phase are much closer to their
optimal values as compared to those obtained by other state-of-the-art methods.
We evaluate the proposed method on image classification tasks, i.e., ResNet-18
and ResNet-50 using ImageNet, and ResNet-18, ResNet-50 and VGG-16 using
CIFAR-10, as well as object detection tasks, i.e., YOLOv3 and YOLOv3-tiny using
COCO 2014 and Ultra-Fast-Lane-Detection using TuSimple lane detection dataset.
Experimental results demonstrate that our SLR-based weight-pruning optimization
approach achieves higher compression rate than state-of-the-arts under the same
accuracy requirement. It also achieves a high model accuracy even at the
hard-pruning stage without retraining (reduces the traditional three-stage
pruning to two-stage). Given a limited budget of retraining epochs, our
approach quickly recovers the model accuracy.
Related papers
- Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural
Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks.
In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Pruning Convolutional Filters using Batch Bridgeout [14.677724755838556]
State-of-the-art computer vision models are rapidly increasing in capacity, where the number of parameters far exceeds the number required to fit the training set.
This results in better optimization and generalization performance.
In order to reduce inference costs, convolutional filters in trained neural networks could be pruned to reduce the run-time memory and computational requirements during inference.
We propose the use of Batch Bridgeout, a sparsity inducing regularization scheme, to train neural networks so that they could be pruned efficiently with minimal degradation in performance.
arXiv Detail & Related papers (2020-09-23T01:51:47Z) - Holistic Filter Pruning for Efficient Deep Neural Networks [25.328005340524825]
"Holistic Filter Pruning" (HFP) is a novel approach for common DNN training that is easy to implement and enables to specify accurate pruning rates.
In various experiments, we give insights into the training and achieve state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-09-17T09:23:36Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.