Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness
- URL: http://arxiv.org/abs/2209.07592v1
- Date: Thu, 15 Sep 2022 19:58:01 GMT
- Title: Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness
- Authors: Mazda Moayeri, Kiarash Banihashem, Soheil Feizi
- Abstract summary: In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
- Score: 48.44639585732391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several existing works study either adversarial or natural distributional
robustness of deep neural networks separately. In practice, however, models
need to enjoy both types of robustness to ensure reliability. In this work, we
bridge this gap and show that in fact, explicit tradeoffs exist between
adversarial and natural distributional robustness. We first consider a simple
linear regression setting on Gaussian data with disjoint sets of core and
spurious features. In this setting, through theoretical and empirical analysis,
we show that (i) adversarial training with $\ell_1$ and $\ell_2$ norms
increases the model reliance on spurious features; (ii) For $\ell_\infty$
adversarial training, spurious reliance only occurs when the scale of the
spurious features is larger than that of the core features; (iii) adversarial
training can have an unintended consequence in reducing distributional
robustness, specifically when spurious correlations are changed in the new test
domain. Next, we present extensive empirical evidence, using a test suite of
twenty adversarially trained models evaluated on five benchmark datasets
(ObjectNet, RIVAL10, Salient ImageNet-1M, ImageNet-9, Waterbirds), that
adversarially trained classifiers rely on backgrounds more than their
standardly trained counterparts, validating our theoretical results. We also
show that spurious correlations in training data (when preserved in the test
domain) can improve adversarial robustness, revealing that previous claims that
adversarial vulnerability is rooted in spurious correlations are incomplete.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - The Surprising Harmfulness of Benign Overfitting for Adversarial
Robustness [13.120373493503772]
We prove a surprising result that even if the ground truth itself is robust to adversarial examples, the benignly overfitted model is benign in terms of the standard'' out-of-sample risk objective.
Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.
arXiv Detail & Related papers (2024-01-19T15:40:46Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Understanding the Logit Distributions of Adversarially-Trained Deep
Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks.
Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood.
We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z) - Adversarial Robustness under Long-Tailed Distribution [93.50792075460336]
Adversarial robustness has attracted extensive studies recently by revealing the vulnerability and intrinsic characteristics of deep networks.
In this work we investigate the adversarial vulnerability as well as defense under long-tailed distributions.
We propose a clean yet effective framework, RoBal, which consists of two dedicated modules, a scale-invariant and data re-balancing.
arXiv Detail & Related papers (2021-04-06T17:53:08Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Bridging the Gap Between Adversarial Robustness and Optimization Bias [28.56135898767349]
Adrial robustness is an open challenge in deep learning, most often tackled using adversarial training.
We show that it is possible to achieve both perfect standard accuracy and a certain degree of robustness without a trade-off.
In particular, we characterize the robustness of linear convolutional models, showing that they resist attacks subject to a constraint on the Fourier-$ell_infty$ norm.
arXiv Detail & Related papers (2021-02-17T16:58:04Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Adversarial Robustness of Supervised Sparse Coding [34.94566482399662]
We consider a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.
We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear encoder.
We provide a robustness certificate for end-to-end classification.
arXiv Detail & Related papers (2020-10-22T22:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.