Analysis and Applications of Class-wise Robustness in Adversarial
Training
- URL: http://arxiv.org/abs/2105.14240v1
- Date: Sat, 29 May 2021 07:28:35 GMT
- Title: Analysis and Applications of Class-wise Robustness in Adversarial
Training
- Authors: Qi Tian, Kun Kuang, Kelu Jiang, Fei Wu, Yisen Wang
- Abstract summary: Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
- Score: 92.08430396614273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial training is one of the most effective approaches to improve model
robustness against adversarial examples. However, previous works mainly focus
on the overall robustness of the model, and the in-depth analysis on the role
of each class involved in adversarial training is still missing. In this paper,
we propose to analyze the class-wise robustness in adversarial training. First,
we provide a detailed diagnosis of adversarial training on six benchmark
datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
Surprisingly, we find that there are remarkable robustness discrepancies among
classes, leading to unbalance/unfair class-wise robustness in the robust
models. Furthermore, we keep investigating the relations between classes and
find that the unbalanced class-wise robustness is pretty consistent among
different attack and defense methods. Moreover, we observe that the stronger
attack methods in adversarial learning achieve performance improvement mainly
from a more successful attack on the vulnerable classes (i.e., classes with
less robustness). Inspired by these interesting findings, we design a simple
but effective attack method based on the traditional PGD attack, named
Temperature-PGD attack, which proposes to enlarge the robustness disparity
among classes with a temperature factor on the confidence distribution of each
image. Experiments demonstrate our method can achieve a higher attack rate than
the PGD attack. Furthermore, from the defense perspective, we also make some
modifications in the training and inference phases to improve the robustness of
the most vulnerable class, so as to mitigate the large difference in class-wise
robustness. We believe our work can contribute to a more comprehensive
understanding of adversarial training as well as rethinking the class-wise
properties in robust models.
Related papers
- Revisiting and Advancing Adversarial Training Through A Simple Baseline [7.226961695849204]
We introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting.
We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers.
Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.
arXiv Detail & Related papers (2023-06-13T08:12:52Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Mutual Adversarial Training: Learning together is better than going
alone [82.78852509965547]
We study how interactions among models affect robustness via knowledge distillation.
We propose mutual adversarial training (MAT) in which multiple models are trained together.
MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks.
arXiv Detail & Related papers (2021-12-09T15:59:42Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.