Picking Winning Tickets Before Training by Preserving Gradient Flow
- URL: http://arxiv.org/abs/2002.07376v2
- Date: Fri, 7 Aug 2020 00:02:33 GMT
- Title: Picking Winning Tickets Before Training by Preserving Gradient Flow
- Authors: Chaoqi Wang, Guodong Zhang, Roger Grosse
- Abstract summary: We argue that efficient training requires preserving the gradient flow through the network.
We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet.
- Score: 9.67608102763644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterization has been shown to benefit both the optimization and
generalization of neural networks, but large networks are resource hungry at
both training and test time. Network pruning can reduce test-time resource
requirements, but is typically applied to trained networks and therefore cannot
avoid the expensive training process. We aim to prune networks at
initialization, thereby saving resources at training time as well.
Specifically, we argue that efficient training requires preserving the gradient
flow through the network. This leads to a simple but effective pruning
criterion we term Gradient Signal Preservation (GraSP). We empirically
investigate the effectiveness of the proposed method with extensive experiments
on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet
architectures. Our method can prune 80% of the weights of a VGG-16 network on
ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover,
our method achieves significantly better performance than the baseline at
extreme sparsity levels.
Related papers
- UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - BCNet: Searching for Network Width with Bilaterally Coupled Network [56.14248440683152]
We introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue.
In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately.
Our method achieves state-of-the-art or competing performance over other baseline methods.
arXiv Detail & Related papers (2021-05-21T18:54:03Z) - Improving the Speed and Quality of GAN by Adversarial Training [87.70013107142142]
We develop FastGAN to improve the speed and quality of GAN training based on the adversarial training technique.
Our training algorithm brings ImageNet training to the broader public by requiring 2-4 GPUs.
arXiv Detail & Related papers (2020-08-07T20:21:31Z) - Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee.
By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z) - Pruning Filters while Training for Efficiently Optimizing Deep Learning
Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks.
We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself.
Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z) - Gradual Channel Pruning while Training using Feature Relevance Scores
for Convolutional Neural Networks [6.534515590778012]
Pruning is one of the predominant approaches used for deep network compression.
We present a simple-yet-effective gradual channel pruning while training methodology using a novel data-driven metric.
We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet.
arXiv Detail & Related papers (2020-02-23T17:56:18Z) - Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training.
We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.