Related papers: Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

URL: http://arxiv.org/abs/2010.13365v2
Date: Sun, 10 Oct 2021 18:23:13 GMT
Title: Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy
Authors: Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon
Abstract summary: CNNs are widely known to be vulnerable to adversarial attacks. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
Score: 85.20742045853738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) have made significant advancement, however, they are widely known to be vulnerable to adversarial attacks. Adversarial training is the most widely used technique for improving adversarial robustness to strong white-box attacks. Prior works have been evaluating and improving the model average robustness without class-wise evaluation. The average evaluation alone might provide a false sense of robustness. For example, the attacker can focus on attacking the vulnerable class, which can be dangerous, especially, when the vulnerable class is a critical one, such as "human" in autonomous driving. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class. For example, in CIFAR10, "cat" is much more vulnerable than other classes. Moreover, this inter-class discrepancy also exists for normally trained models, while adversarial training tends to further increase the discrepancy. Our work aims to investigate the following questions: (a) is the phenomenon of inter-class discrepancy universal regardless of datasets, model architectures and optimization hyper-parameters? (b) If so, what can be possible explanations for the inter-class discrepancy? (c) Can the techniques proposed in the long tail classification be readily extended to adversarial training for addressing the inter-class discrepancy?

Related papers

Towards Class-wise Robustness Analysis [15.351461000403074]
Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. This study investigates class-to-class biases across adversarially trained robust classification models. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks.
arXiv Detail & Related papers (2024-11-29T17:09:59Z)
Learning Fair Robustness via Domain Mixup [8.471466670802817]
We propose the use of mixup for the problem of learning fair robust classifiers. We show that mixup combined with adversarial training can provably reduce the class-wise robustness disparity.
arXiv Detail & Related papers (2024-11-21T18:56:33Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
WAT: Improve the Worst-class Robustness in Adversarial Training [11.872656386839436]
Adversarial training is a popular strategy to defend against adversarial attacks. Deep Neural Networks (DNN) have been shown to be vulnerable to adversarial examples. This paper proposes a novel framework of worst-class adversarial training.
arXiv Detail & Related papers (2023-02-08T12:54:19Z)
Improving Adversarial Robustness with Self-Paced Hard-Class Pair Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods. We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other. We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z)
Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Analysis and Applications of Class-wise Robustness in Adversarial Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z)
Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks. In this work, we find that a UAP does not attack all classes equally. We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.