Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better
- URL: http://arxiv.org/abs/2108.07969v1
- Date: Wed, 18 Aug 2021 04:32:35 GMT
- Title: Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better
- Authors: Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang
- Abstract summary: We propose a novel adversarial robustness distillation method called Robust Soft Label Adversarial Distillation (RSLAD)
RSLAD fully exploits the robust soft labels produced by a robust (adversarially-trained) large teacher model to guide the student's learning.
We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods.
- Score: 66.69777970159558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial training is one effective approach for training robust deep
neural networks against adversarial attacks. While being able to bring reliable
robustness, adversarial training (AT) methods in general favor high capacity
models, i.e., the larger the model the better the robustness. This tends to
limit their effectiveness on small models, which are more preferable in
scenarios where storage or computing resources are very limited (e.g., mobile
devices). In this paper, we leverage the concept of knowledge distillation to
improve the robustness of small models by distilling from adversarially trained
large models. We first revisit several state-of-the-art AT methods from a
distillation perspective and identify one common technique that can lead to
improved robustness: the use of robust soft labels -- predictions of a robust
model. Following this observation, we propose a novel adversarial robustness
distillation method called Robust Soft Label Adversarial Distillation (RSLAD)
to train robust small student models. RSLAD fully exploits the robust soft
labels produced by a robust (adversarially-trained) large teacher model to
guide the student's learning on both natural and adversarial examples in all
loss terms. We empirically demonstrate the effectiveness of our RSLAD approach
over existing adversarial training and distillation methods in improving the
robustness of small models against state-of-the-art attacks including the
AutoAttack. We also provide a set of understandings on our RSLAD and the
importance of robust soft labels for adversarial robustness distillation.
Related papers
- Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks [11.389689242531327]
Adversarial training is one of the most effective methods for enhancing model robustness.
Previous approaches primarily use static ground truth for adversarial training, but this often causes robust overfitting.
We propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gain robustness from the guide model's decisions.
arXiv Detail & Related papers (2024-08-23T14:25:12Z) - Annealing Self-Distillation Rectification Improves Adversarial Training [0.10241134756773226]
We analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs.
We propose Annealing Self-Distillation Rectification, which generates soft labels as a better guidance mechanism.
We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
arXiv Detail & Related papers (2023-05-20T06:35:43Z) - Adversarial Contrastive Distillation with Adaptive Denoising [15.119013995045192]
We propose Contrastive Relationship DeNoise Distillation (CRDND) to boost the robustness of small models.
We show CRDND can transfer robust knowledge efficiently and achieves state-of-the-art performances.
arXiv Detail & Related papers (2023-02-17T09:00:18Z) - Mutual Adversarial Training: Learning together is better than going
alone [82.78852509965547]
We study how interactions among models affect robustness via knowledge distillation.
We propose mutual adversarial training (MAT) in which multiple models are trained together.
MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks.
arXiv Detail & Related papers (2021-12-09T15:59:42Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.