Related papers: Why adversarial training can hurt robust accuracy

Why adversarial training can hurt robust accuracy

URL: http://arxiv.org/abs/2203.02006v1
Date: Thu, 3 Mar 2022 20:41:38 GMT
Title: Why adversarial training can hurt robust accuracy
Authors: Jacob Clarysse and Julia H\"ormann and Fanny Yang
Abstract summary: adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. Our proof provides explanatory insights that may also transfer to feature learning models.
Score: 7.906608953906889
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true -- Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Our proof provides explanatory insights that may also transfer to feature learning models. Further, we observe in experiments on standard image datasets that the same behavior occurs for perceptible attacks that effectively reduce class information such as mask attacks and object corruptions.

Related papers

How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z)
Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm. We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance. We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z)
Btech thesis report on adversarial attack detection and purification of adverserially attacked images [0.0]
This thesis report is on detection and purification of adverserially attacked images. A deep learning model is trained on certain training examples for various tasks such as classification, regression etc.
arXiv Detail & Related papers (2022-05-09T09:24:11Z)
Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community. We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [70.82725772926949]
Adversarial training is a popular method to robustify models against adversarial attacks. In this work, we investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
Indiscriminate Poisoning Attacks Are Shortcuts [77.38947817228656]
We find that the perturbations of advanced poisoning attacks are almost textbflinear separable when assigned with the target labels of the corresponding samples. We show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the emphshortcut learning problem is more serious than previously believed.
arXiv Detail & Related papers (2021-11-01T12:44:26Z)
Simulated Adversarial Testing of Face Recognition Models [53.10078734154151]
We propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner. We are the first to show that weaknesses of models trained on real data can be discovered using simulated samples.
arXiv Detail & Related papers (2021-06-08T17:58:10Z)
Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
The Curious Case of Adversarially Robust Models: More Data Can Help, Double Descend, or Hurt Generalization [36.87923859576768]
Adversarial training has shown its ability in producing models that are robust to perturbations on the input data, but usually at the expense of decrease in the standard accuracy. In this paper, we show that more training data can hurt the generalization of adversarially robust models in the classification problems.
arXiv Detail & Related papers (2020-02-25T18:25:28Z)
A Bayes-Optimal View on Adversarial Examples [9.51828574518325]
We argue for examining adversarial examples from the perspective of Bayes-optimal classification. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
arXiv Detail & Related papers (2020-02-20T16:43:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.