Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training
- URL: http://arxiv.org/abs/2202.09844v2
- Date: Tue, 22 Feb 2022 07:30:23 GMT
- Title: Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training
- Authors: Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu
Ma, Zehao Wang, Zhangyang Wang
- Abstract summary: We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
- Score: 94.92954973680914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies demonstrate that deep networks, even robustified by the
state-of-the-art adversarial training (AT), still suffer from large robust
generalization gaps, in addition to the much more expensive training costs than
standard training. In this paper, we investigate this intriguing problem from a
new perspective, i.e., injecting appropriate forms of sparsity during
adversarial training. We introduce two alternatives for sparse adversarial
training: (i) static sparsity, by leveraging recent results from the lottery
ticket hypothesis to identify critical sparse subnetworks arising from the
early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to
adaptively adjust its connectivity pattern (while sticking to the same sparsity
ratio) throughout training. We find both static and dynamic sparse methods to
yield win-win: substantially shrinking the robust generalization gap and
alleviating the robust overfitting, meanwhile significantly saving training and
inference FLOPs. Extensive experiments validate our proposals with multiple
network architectures on diverse datasets, including CIFAR-10/100 and
Tiny-ImageNet. For example, our methods reduce robust generalization gap and
overfitting by 34.44% and 4.02%, with comparable robust/standard accuracy
boosts and 87.83%/87.82% training/inference FLOPs savings on CIFAR-100 with
ResNet-18. Besides, our approaches can be organically combined with existing
regularizers, establishing new state-of-the-art results in AT. Codes are
available in https://github.com/VITA-Group/Sparsity-Win-Robust-Generalization.
Related papers
- Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training [32.30355584300427]
We present a novel sparse training approach, termed textbfSup-tickets, which can satisfy two desiderata concurrently in a single sparse-to-sparse training process.
Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods.
arXiv Detail & Related papers (2022-05-30T16:01:32Z) - Adversarial Training with Stochastic Weight Average [4.633908654744751]
Adrial training deep neural networks often experience serious overfitting problem.
In traditional machine learning, one way to relieve overfitting from the lack of data is to use ensemble methods.
In this paper, we propose adversarial training with weight average (SWA)
While performing adversarial training, we aggregate the temporal weight states in the trajectory of training.
arXiv Detail & Related papers (2020-09-21T04:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.