Related papers: Generating Less Certain Adversarial Examples Improves Robust Generalization

Generating Less Certain Adversarial Examples Improves Robust Generalization

URL: http://arxiv.org/abs/2310.04539v4
Date: Tue, 14 Jan 2025 06:42:51 GMT
Title: Generating Less Certain Adversarial Examples Improves Robust Generalization
Authors: Minxing Zhang, Michael Backes, Xiao Zhang,
Abstract summary: This paper revisits the robust overfitting phenomenon of adversarial training.<n>We argue that overconfidence in predicting adversarial examples is a potential cause.<n>We propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples.
Score: 22.00283527210342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper revisits the robust overfitting phenomenon of adversarial training. Observing that models with better robust generalization performance are less certain in predicting adversarially generated training inputs, we argue that overconfidence in predicting adversarial examples is a potential cause. Therefore, we hypothesize that generating less certain adversarial examples improves robust generalization, and propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples. Our theoretical analysis of synthetic distributions characterizes the connection between adversarial certainty and robust generalization. Accordingly, built upon the notion of adversarial certainty, we develop a general method to search for models that can generate training-time adversarial inputs with reduced certainty, while maintaining the model's capability in distinguishing adversarial examples. Extensive experiments on image benchmarks demonstrate that our method effectively learns models with consistently improved robustness and mitigates robust overfitting, confirming the importance of generating less certain adversarial examples for robust generalization. Our implementations are available as open-source code at: https://github.com/TrustMLRG/AdvCertainty.

Related papers

A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness.<n>Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z)
Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method. Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z)
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment [24.577363665112706]
Recent adversarial training techniques have utilized inverse adversarial attacks to generate high-confidence examples. Our investigation reveals that high-confidence outputs under inverse adversarial attacks are correlated with biased feature activation. We propose Debiased High-Confidence Adversarial Training (DHAT) to address this bias. DHAT achieves state-of-the-art performance and exhibits robust generalization capabilities across various vision datasets.
arXiv Detail & Related papers (2024-08-12T11:56:06Z)
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression [32.727673706238086]
We propose a way of delving into the unexpected vulnerability in adversarially trained networks from a causal perspective. By deploying it, we estimate the causal relation of adversarial prediction under an unbiased environment. We demonstrate that the estimated causal features are highly related to the correct prediction for adversarial robustness.
arXiv Detail & Related papers (2023-03-02T08:18:22Z)
The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart. Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z)
Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models [21.06607915149245]
We show that standard adversarial training methods may make a model more vulnerable to fickle adversarial examples. We introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
arXiv Detail & Related papers (2022-10-20T18:02:07Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods. Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space. We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks. We investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
A Frequency Perspective of Adversarial Robustness [72.48178241090149]
We present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings. Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent. We propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
arXiv Detail & Related papers (2021-10-26T19:12:34Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
Adversarially Robust Estimate and Risk Analysis in Linear Regression [17.931533943788335]
Adversarially robust learning aims to design algorithms that are robust to small adversarial perturbations on input variables. By discovering the statistical minimax rate of convergence of adversarially robust estimators, we emphasize the importance of incorporating model information. We propose a straightforward two-stage adversarial learning framework, which facilitates to utilize model structure information to improve adversarial robustness.
arXiv Detail & Related papers (2020-12-18T14:55:55Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Provably robust deep generative models [1.52292571922932]
We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE) We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
arXiv Detail & Related papers (2020-04-22T14:47:41Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification. This paper studies a complementary failure mode, invariance-based adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.