Adversarial Robustness: What fools you makes you stronger
- URL: http://arxiv.org/abs/2102.05475v1
- Date: Wed, 10 Feb 2021 15:00:24 GMT
- Title: Adversarial Robustness: What fools you makes you stronger
- Authors: Grzegorz G{\l}uch, R\"udiger Urbanke
- Abstract summary: We prove an exponential separation for the sample complexity between the PAC-learning model and a version of the Equivalence-Query-learning model.
We show that this separation has interesting implications for adversarial robustness.
We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove an exponential separation for the sample complexity between the
standard PAC-learning model and a version of the Equivalence-Query-learning
model. We then show that this separation has interesting implications for
adversarial robustness. We explore a vision of designing an adaptive defense
that in the presence of an attacker computes a model that is provably robust.
In particular, we show how to realize this vision in a simplified setting.
In order to do so, we introduce a notion of a strong adversary: he is not
limited by the type of perturbations he can apply but when presented with a
classifier can repetitively generate different adversarial examples. We explain
why this notion is interesting to study and use it to prove the following.
There exists an efficient adversarial-learning-like scheme such that for every
strong adversary $\mathbf{A}$ it outputs a classifier that (a) cannot be
strongly attacked by $\mathbf{A}$, or (b) has error at most $\epsilon$. In both
cases our scheme uses exponentially (in $\epsilon$) fewer samples than what the
PAC bound requires.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Sample Complexity of Robust Learning against Evasion Attacks [3.8888996044605855]
We study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity.
We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable.
We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk algorithms in the distribution-free setting.
arXiv Detail & Related papers (2023-08-23T10:51:33Z) - How many dimensions are required to find an adversarial example? [0.0]
We investigate how adversarial vulnerability depends on $dim(V)$.
In particular, we show that the adversarial success of standard PGD attacks with $ellp$ norm constraints behaves like a monotonically increasing function of $epsilon.
arXiv Detail & Related papers (2023-03-24T17:36:15Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters
Substitution [1.8782750537161608]
We introduce "unrestricted" perturbations that create adversarial samples by using spurious relations learned by model training.
Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results.
We create adversarial samples by using them to replace the corresponding feature clusters in the target image.
arXiv Detail & Related papers (2022-08-31T07:42:36Z) - Sample Complexity Bounds for Robustly Learning Decision Lists against
Evasion Attacks [25.832511407411637]
A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks.
We work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability.
For every fixed $k$ the class of $k$-decision lists has sample complexity against a $log(n)$-bounded adversary.
arXiv Detail & Related papers (2022-05-12T14:40:18Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Robust learning under clean-label attack [26.323812778809913]
We study the problem of robust learning under clean-label data-poisoning attacks.
The learning goal is to minimize the attackable rate, which is more difficult than optimal PAC learning.
arXiv Detail & Related papers (2021-03-01T00:21:15Z) - Towards Defending Multiple $\ell_p$-norm Bounded Adversarial
Perturbations via Gated Batch Normalization [120.99395850108422]
Existing adversarial defenses typically improve model robustness against individual specific perturbations.
Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory.
We propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $ell_p bounded adversarial perturbations.
arXiv Detail & Related papers (2020-12-03T02:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.