Generalization Properties of Adversarial Training for $\ell_0$-Bounded
Adversarial Attacks
- URL: http://arxiv.org/abs/2402.03576v1
- Date: Mon, 5 Feb 2024 22:57:33 GMT
- Title: Generalization Properties of Adversarial Training for $\ell_0$-Bounded
Adversarial Attacks
- Authors: Payam Delgosha, Hamed Hassani, Ramtin Pedarsani
- Abstract summary: In this paper, we aim to theoretically characterize the performance of adversarial training for an important class of neural networks.
Deriving a generalization in this setting has two main challenges.
- Score: 47.22918498465056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We have widely observed that neural networks are vulnerable to small additive
perturbations to the input causing misclassification. In this paper, we focus
on the $\ell_0$-bounded adversarial attacks, and aim to theoretically
characterize the performance of adversarial training for an important class of
truncated classifiers. Such classifiers are shown to have strong performance
empirically, as well as theoretically in the Gaussian mixture model, in the
$\ell_0$-adversarial setting. The main contribution of this paper is to prove a
novel generalization bound for the binary classification setting with
$\ell_0$-bounded adversarial perturbation that is distribution-independent.
Deriving a generalization bound in this setting has two main challenges: (i)
the truncated inner product which is highly non-linear; and (ii) maximization
over the $\ell_0$ ball due to adversarial training is non-convex and highly
non-smooth. To tackle these challenges, we develop new coding techniques for
bounding the combinatorial dimension of the truncated hypothesis class.
Related papers
- On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [11.30047438005394]
This work investigates the question of how to choose the regularization norm $lVert cdot rVert$ in the context of high-dimensional adversarial training for binary classification.
We quantitatively characterize the relationship between perturbation size and the optimal choice of $lVert cdot rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.
arXiv Detail & Related papers (2024-10-21T14:53:12Z) - Certified Robustness against Sparse Adversarial Perturbations via Data Localization [39.883465335244594]
We show that a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.
arXiv Detail & Related papers (2024-05-23T05:02:00Z) - An Intermediate-level Attack Framework on The Basis of Linear Regression [89.85593878754571]
This paper substantially extends our work published at ECCV, in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples.
We advocate to establish a direct linear mapping from the intermediate-level discrepancies (between adversarial features and benign features) to classification prediction loss of the adversarial example.
We show that 1) a variety of linear regression models can all be considered in order to establish the mapping, 2) the magnitude of the finally obtained intermediate-level discrepancy is linearly correlated with adversarial transferability, and 3) further boost of the performance can be achieved by performing multiple runs of the baseline attack with
arXiv Detail & Related papers (2022-03-21T03:54:53Z) - Towards Compositional Adversarial Robustness: Generalizing Adversarial
Training to Composite Semantic Perturbations [70.05004034081377]
We first propose a novel method for generating composite adversarial examples.
Our method can find the optimal attack composition by utilizing component-wise projected gradient descent.
We then propose generalized adversarial training (GAT) to extend model robustness from $ell_p$-ball to composite semantic perturbations.
arXiv Detail & Related papers (2022-02-09T02:41:56Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Robust Classification Under $\ell_0$ Attack for the Gaussian Mixture
Model [39.414875342234204]
We develop a novel classification algorithm called FilTrun that has two main modules: filtration and Truncation.
We discuss several examples that illustrate interesting behaviors such as a phase transition for adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized.
arXiv Detail & Related papers (2021-04-05T23:31:25Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.