Why adversarial training can hurt robust accuracy
- URL: http://arxiv.org/abs/2203.02006v1
- Date: Thu, 3 Mar 2022 20:41:38 GMT
- Title: Why adversarial training can hurt robust accuracy
- Authors: Jacob Clarysse and Julia H\"ormann and Fanny Yang
- Abstract summary: adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime.
Our proof provides explanatory insights that may also transfer to feature learning models.
- Score: 7.906608953906889
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning classifiers with high test accuracy often perform poorly
under adversarial attacks. It is commonly believed that adversarial training
alleviates this issue. In this paper, we demonstrate that, surprisingly, the
opposite may be true -- Even though adversarial training helps when enough data
is available, it may hurt robust generalization in the small sample size
regime. We first prove this phenomenon for a high-dimensional linear
classification setting with noiseless observations. Our proof provides
explanatory insights that may also transfer to feature learning models.
Further, we observe in experiments on standard image datasets that the same
behavior occurs for perceptible attacks that effectively reduce class
information such as mask attacks and object corruptions.
Related papers
- How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Btech thesis report on adversarial attack detection and purification of
adverserially attacked images [0.0]
This thesis report is on detection and purification of adverserially attacked images.
A deep learning model is trained on certain training examples for various tasks such as classification, regression etc.
arXiv Detail & Related papers (2022-05-09T09:24:11Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Indiscriminate Poisoning Attacks Are Shortcuts [77.38947817228656]
We find that the perturbations of advanced poisoning attacks are almost textbflinear separable when assigned with the target labels of the corresponding samples.
We show that such synthetic perturbations are as powerful as the deliberately crafted attacks.
Our finding suggests that the emphshortcut learning problem is more serious than previously believed.
arXiv Detail & Related papers (2021-11-01T12:44:26Z) - Simulated Adversarial Testing of Face Recognition Models [53.10078734154151]
We propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner.
We are the first to show that weaknesses of models trained on real data can be discovered using simulated samples.
arXiv Detail & Related papers (2021-06-08T17:58:10Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - The Curious Case of Adversarially Robust Models: More Data Can Help,
Double Descend, or Hurt Generalization [36.87923859576768]
Adversarial training has shown its ability in producing models that are robust to perturbations on the input data, but usually at the expense of decrease in the standard accuracy.
In this paper, we show that more training data can hurt the generalization of adversarially robust models in the classification problems.
arXiv Detail & Related papers (2020-02-25T18:25:28Z) - A Bayes-Optimal View on Adversarial Examples [9.51828574518325]
We argue for examining adversarial examples from the perspective of Bayes-optimal classification.
Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
arXiv Detail & Related papers (2020-02-20T16:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.