Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy
- URL: http://arxiv.org/abs/2010.13365v2
- Date: Sun, 10 Oct 2021 18:23:13 GMT
- Title: Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy
- Authors: Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon
- Abstract summary: CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
- Score: 85.20742045853738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) have made significant advancement,
however, they are widely known to be vulnerable to adversarial attacks.
Adversarial training is the most widely used technique for improving
adversarial robustness to strong white-box attacks. Prior works have been
evaluating and improving the model average robustness without class-wise
evaluation. The average evaluation alone might provide a false sense of
robustness. For example, the attacker can focus on attacking the vulnerable
class, which can be dangerous, especially, when the vulnerable class is a
critical one, such as "human" in autonomous driving. We propose an empirical
study on the class-wise accuracy and robustness of adversarially trained
models. We find that there exists inter-class discrepancy for accuracy and
robustness even when the training dataset has an equal number of samples for
each class. For example, in CIFAR10, "cat" is much more vulnerable than other
classes. Moreover, this inter-class discrepancy also exists for normally
trained models, while adversarial training tends to further increase the
discrepancy. Our work aims to investigate the following questions: (a) is the
phenomenon of inter-class discrepancy universal regardless of datasets, model
architectures and optimization hyper-parameters? (b) If so, what can be
possible explanations for the inter-class discrepancy? (c) Can the techniques
proposed in the long tail classification be readily extended to adversarial
training for addressing the inter-class discrepancy?
Related papers
- Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - WAT: Improve the Worst-class Robustness in Adversarial Training [11.872656386839436]
Adversarial training is a popular strategy to defend against adversarial attacks.
Deep Neural Networks (DNN) have been shown to be vulnerable to adversarial examples.
This paper proposes a novel framework of worst-class adversarial training.
arXiv Detail & Related papers (2023-02-08T12:54:19Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks.
In this work, we find that a UAP does not attack all classes equally.
We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z) - Optimal Transport as a Defense Against Adversarial Attacks [4.6193503399184275]
Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model.
Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness.
We propose to use a loss between distributions that faithfully reflect the ground distance.
This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks.
arXiv Detail & Related papers (2021-02-05T13:24:36Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.