The Surprising Harmfulness of Benign Overfitting for Adversarial
Robustness
- URL: http://arxiv.org/abs/2401.12236v2
- Date: Thu, 25 Jan 2024 14:57:48 GMT
- Title: The Surprising Harmfulness of Benign Overfitting for Adversarial
Robustness
- Authors: Yifan Hao, Tong Zhang
- Abstract summary: We prove a surprising result that even if the ground truth itself is robust to adversarial examples, the benignly overfitted model is benign in terms of the standard'' out-of-sample risk objective.
Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.
- Score: 13.120373493503772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent empirical and theoretical studies have established the generalization
capabilities of large machine learning models that are trained to
(approximately or exactly) fit noisy data. In this work, we prove a surprising
result that even if the ground truth itself is robust to adversarial examples,
and the benignly overfitted model is benign in terms of the ``standard''
out-of-sample risk objective, this benign overfitting process can be harmful
when out-of-sample data are subject to adversarial manipulation. More
specifically, our main results contain two parts: (i) the min-norm estimator in
overparameterized linear model always leads to adversarial vulnerability in the
``benign overfitting'' setting; (ii) we verify an asymptotic trade-off result
between the standard risk and the ``adversarial'' risk of every ridge
regression estimator, implying that under suitable conditions these two items
cannot both be small at the same time by any single choice of the ridge
regularization parameter. Furthermore, under the lazy training regime, we
demonstrate parallel results on two-layer neural tangent kernel (NTK) model,
which align with empirical observations in deep neural networks. Our finding
provides theoretical insights into the puzzling phenomenon observed in
practice, where the true target function (e.g., human) is robust against
adverasrial attack, while beginly overfitted neural networks lead to models
that are not robust.
Related papers
- Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z) - The curse of overparametrization in adversarial training: Precise
analysis of robust generalization for random features regression [34.35440701530876]
We show that for adversarially trained random features models, high overparametrization can hurt robust generalization.
Our developed theory reveals the nontrivial effect of overparametrization on robustness and indicates that for adversarially trained random features models, high overparametrization can hurt robust generalization.
arXiv Detail & Related papers (2022-01-13T18:57:30Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Closeness and Uncertainty Aware Adversarial Examples Detection in
Adversarial Machine Learning [0.7734726150561088]
We explore and assess the usage of 2 different groups of metrics in detecting adversarial samples.
We introduce a new feature for adversarial detection, and we show that the performances of all these metrics heavily depend on the strength of the attack being used.
arXiv Detail & Related papers (2020-12-11T14:44:59Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.