Fundamental Tradeoffs in Distributionally Adversarial Training
- URL: http://arxiv.org/abs/2101.06309v1
- Date: Fri, 15 Jan 2021 21:59:18 GMT
- Title: Fundamental Tradeoffs in Distributionally Adversarial Training
- Authors: Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi, Anup Rao and Tung Mai
- Abstract summary: Adversarial training is one of the most effective techniques to improve the robustness of models against adversarial perturbations.
In this paper, we study the tradeoff between standard risk and adversarial risk.
We show that a tradeoff between standard and adversarial risk is manifested in all three settings.
- Score: 21.6024500220438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training is among the most effective techniques to improve the
robustness of models against adversarial perturbations. However, the full
effect of this approach on models is not well understood. For example, while
adversarial training can reduce the adversarial risk (prediction error against
an adversary), it sometimes increase standard risk (generalization error when
there is no adversary). Even more, such behavior is impacted by various
elements of the learning problem, including the size and quality of training
data, specific forms of adversarial perturbations in the input, model
overparameterization, and adversary's power, among others. In this paper, we
focus on \emph{distribution perturbing} adversary framework wherein the
adversary can change the test distribution within a neighborhood of the
training data distribution. The neighborhood is defined via Wasserstein
distance between distributions and the radius of the neighborhood is a measure
of adversary's manipulative power. We study the tradeoff between standard risk
and adversarial risk and derive the Pareto-optimal tradeoff, achievable over
specific classes of models, in the infinite data limit with features dimension
kept fixed. We consider three learning settings: 1) Regression with the class
of linear models; 2) Binary classification under the Gaussian mixtures data
model, with the class of linear classifiers; 3) Regression with the class of
random features model (which can be equivalently represented as two-layer
neural network with random first-layer weights). We show that a tradeoff
between standard and adversarial risk is manifested in all three settings. We
further characterize the Pareto-optimal tradeoff curves and discuss how a
variety of factors, such as features correlation, adversary's power or the
width of two-layer neural network would affect this tradeoff.
Related papers
- Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement [3.0820287240219795]
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning.
Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples.
We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods.
arXiv Detail & Related papers (2024-04-18T00:41:32Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Understanding the Logit Distributions of Adversarially-Trained Deep
Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks.
Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood.
We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z) - On the Generalization Properties of Adversarial Training [21.79888306754263]
This paper studies the generalization performance of a generic adversarial training algorithm.
A series of numerical studies are conducted to demonstrate how the smoothness and L1 penalization help improve the adversarial robustness of models.
arXiv Detail & Related papers (2020-08-15T02:32:09Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features.
We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach.
Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.