Towards Practical Lottery Ticket Hypothesis for Adversarial Training
- URL: http://arxiv.org/abs/2003.05733v1
- Date: Fri, 6 Mar 2020 03:11:52 GMT
- Title: Towards Practical Lottery Ticket Hypothesis for Adversarial Training
- Authors: Bai Li, Shiqi Wang, Yunhan Jia, Yantao Lu, Zhenyu Zhong, Lawrence
Carin, Suman Jana
- Abstract summary: We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process.
As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
- Score: 78.30684998080346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has proposed the lottery ticket hypothesis, suggesting that
for a deep neural network, there exist trainable sub-networks performing
equally or better than the original model with commensurate training steps.
While this discovery is insightful, finding proper sub-networks requires
iterative training and pruning. The high cost incurred limits the applications
of the lottery ticket hypothesis. We show there exists a subset of the
aforementioned sub-networks that converge significantly faster during the
training process and thus can mitigate the cost issue. We conduct extensive
experiments to show such sub-networks consistently exist across various model
structures for a restrictive setting of hyperparameters ($e.g.$, carefully
selected learning rate, pruning ratio, and model capacity). As a practical
application of our findings, we demonstrate that such sub-networks can help in
cutting down the total time of adversarial training, a standard approach to
improve robustness, by up to 49\% on CIFAR-10 to achieve the state-of-the-art
robustness.
Related papers
- Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches.
This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods.
We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training [32.30355584300427]
We present a novel sparse training approach, termed textbfSup-tickets, which can satisfy two desiderata concurrently in a single sparse-to-sparse training process.
Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods.
arXiv Detail & Related papers (2022-05-30T16:01:32Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Juvenile state hypothesis: What we can learn from lottery ticket
hypothesis researches? [1.701869491238765]
Original lottery ticket hypothesis performs pruning and weight resetting after training convergence.
We propose a strategy that combines the idea of neural network structure search with a pruning algorithm to alleviate this problem.
arXiv Detail & Related papers (2021-09-08T18:22:00Z) - How much pre-training is enough to discover a good subnetwork? [10.699603774240853]
We mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well.
We find a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network.
Experiments with larger datasets require more pre-training forworks obtained via pruning to perform well.
arXiv Detail & Related papers (2021-07-31T15:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.