Generating Less Certain Adversarial Examples Improves Robust Generalization
- URL: http://arxiv.org/abs/2310.04539v3
- Date: Mon, 21 Oct 2024 11:12:02 GMT
- Title: Generating Less Certain Adversarial Examples Improves Robust Generalization
- Authors: Minxing Zhang, Michael Backes, Xiao Zhang,
- Abstract summary: This paper revisits the robust overfitting phenomenon of adversarial training.
We argue that overconfidence in predicting adversarial examples is a potential cause.
We propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples.
- Score: 22.00283527210342
- License:
- Abstract: This paper revisits the robust overfitting phenomenon of adversarial training. Observing that models with better robust generalization performance are less certain in predicting adversarially generated training inputs, we argue that overconfidence in predicting adversarial examples is a potential cause. Therefore, we hypothesize that generating less certain adversarial examples improves robust generalization, and propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples. Our theoretical analysis of synthetic distributions characterizes the connection between adversarial certainty and robust generalization. Accordingly, built upon the notion of adversarial certainty, we develop a general method to search for models that can generate training-time adversarial inputs with reduced certainty, while maintaining the model's capability in distinguishing adversarial examples. Extensive experiments on image benchmarks demonstrate that our method effectively learns models with consistently improved robustness and mitigates robust overfitting, confirming the importance of generating less certain adversarial examples for robust generalization.
Related papers
- Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method.
Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z) - Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment [24.577363665112706]
Recent adversarial training techniques have utilized inverse adversarial attacks to generate high-confidence examples.
Our investigation reveals that high-confidence outputs under inverse adversarial attacks are correlated with biased feature activation.
We propose Debiased High-Confidence Adversarial Training (DHAT) to address this bias.
DHAT achieves state-of-the-art performance and exhibits robust generalization capabilities across various vision datasets.
arXiv Detail & Related papers (2024-08-12T11:56:06Z) - Constructing Semantics-Aware Adversarial Examples with Probabilistic
Perspective [4.685487217906502]
We present a method for creating semantics-aware adversarial examples.
Our method produces adversarial perturbations that maintain the original image's semantics.
It offers users the flexibility to inject their own understanding of semantics into the adversarial examples.
arXiv Detail & Related papers (2023-06-01T05:16:44Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Balanced Adversarial Training: Balancing Tradeoffs between Fickleness
and Obstinacy in NLP Models [21.06607915149245]
We show that standard adversarial training methods may make a model more vulnerable to fickle adversarial examples.
We introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
arXiv Detail & Related papers (2022-10-20T18:02:07Z) - On the Impact of Hard Adversarial Instances on Overfitting in
Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks.
We investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z) - A Frequency Perspective of Adversarial Robustness [72.48178241090149]
We present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings.
Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent.
We propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
arXiv Detail & Related papers (2021-10-26T19:12:34Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - Adversarially Robust Estimate and Risk Analysis in Linear Regression [17.931533943788335]
Adversarially robust learning aims to design algorithms that are robust to small adversarial perturbations on input variables.
By discovering the statistical minimax rate of convergence of adversarially robust estimators, we emphasize the importance of incorporating model information.
We propose a straightforward two-stage adversarial learning framework, which facilitates to utilize model structure information to improve adversarial robustness.
arXiv Detail & Related papers (2020-12-18T14:55:55Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.