Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
- URL: http://arxiv.org/abs/2206.04762v1
- Date: Thu, 9 Jun 2022 20:52:50 GMT
- Title: Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
- Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Yang Zhang, Shiyu Chang,
Zhangyang Wang
- Abstract summary: We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
- Score: 129.85939347733387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training serves as a broadly adopted starting point for transfer learning
on various downstream tasks. Recent investigations of lottery tickets
hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced
by extremely sparse subnetworks (a.k.a. matching subnetworks) without
sacrificing transferability. However, practical security-crucial applications
usually pose more challenging requirements beyond standard transfer, which also
demand these subnetworks to overcome adversarial vulnerability. In this paper,
we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a
located subnetwork from a pre-trained model can be independently transferred on
diverse downstream tasks, to reach BOTH the same standard and robust
generalization, under BOTH standard and adversarial training regimes, as the
full pre-trained model can do. We comprehensively examine various pre-training
mechanisms and find that robust pre-training tends to craft sparser double-win
lottery tickets with superior performance over the standard counterparts. For
example, on downstream CIFAR-10/100 datasets, we identify double-win matching
subnetworks with the standard, fast adversarial, and adversarial pre-training
from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity,
respectively. Furthermore, we observe the obtained double-win lottery tickets
can be more data-efficient to transfer, under practical data-limited (e.g., 1%
and 10%) downstream schemes. Our results show that the benefits from robust
pre-training are amplified by the lottery ticket scheme, as well as the
data-limited transfer setting. Codes are available at
https://github.com/VITA-Group/Double-Win-LTH.
Related papers
- Robust Tickets Can Transfer Better: Drawing More Transferable
Subnetworks in Transfer Learning [25.310066345466396]
Transfer learning leverages feature representations of deep neural networks (DNNs) pretrained on source tasks with rich data to empower finetuning on downstream tasks.
We propose a new transfer learning pipeline, which leverages our finding that robust tickets can transfer better, i.e.,works drawn with properly induced adversarial robustness can win better transferability over vanilla lottery ticketworks.
arXiv Detail & Related papers (2023-04-24T05:44:42Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.