Related papers: Adversarial Robustness: What fools you makes you stronger

Adversarial Robustness: What fools you makes you stronger

URL: http://arxiv.org/abs/2102.05475v1
Date: Wed, 10 Feb 2021 15:00:24 GMT
Title: Adversarial Robustness: What fools you makes you stronger
Authors: Grzegorz G{\l}uch, R\"udiger Urbanke
Abstract summary: We prove an exponential separation for the sample complexity between the PAC-learning model and a version of the Equivalence-Query-learning model. We show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust.
Score: 1.14219428942199
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting. In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary $\mathbf{A}$ it outputs a classifier that (a) cannot be strongly attacked by $\mathbf{A}$, or (b) has error at most $\epsilon$. In both cases our scheme uses exponentially (in $\epsilon$) fewer samples than what the PAC bound requires.

Related papers

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness. We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z)
Sample Complexity of Robust Learning against Evasion Attacks [3.8888996044605855]
We study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity. We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable. We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk algorithms in the distribution-free setting.
arXiv Detail & Related papers (2023-08-23T10:51:33Z)
How many dimensions are required to find an adversarial example? [0.0]
We investigate how adversarial vulnerability depends on $dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $ellp$ norm constraints behaves like a monotonically increasing function of $epsilon.
arXiv Detail & Related papers (2023-03-24T17:36:15Z)
The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart. Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z)
Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters Substitution [1.8782750537161608]
We introduce "unrestricted" perturbations that create adversarial samples by using spurious relations learned by model training. Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results. We create adversarial samples by using them to replace the corresponding feature clusters in the target image.
arXiv Detail & Related papers (2022-08-31T07:42:36Z)
Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks [25.832511407411637]
A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks. We work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability. For every fixed $k$ the class of $k$-decision lists has sample complexity against a $log(n)$-bounded adversary.
arXiv Detail & Related papers (2022-05-12T14:40:18Z)
Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning. Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations. We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z)
Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action. We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Robust learning under clean-label attack [26.323812778809913]
We study the problem of robust learning under clean-label data-poisoning attacks. The learning goal is to minimize the attackable rate, which is more difficult than optimal PAC learning.
arXiv Detail & Related papers (2021-03-01T00:21:15Z)
Towards Defending Multiple $\ell_p$-norm Bounded Adversarial Perturbations via Gated Batch Normalization [120.99395850108422]
Existing adversarial defenses typically improve model robustness against individual specific perturbations. Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory. We propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $ell_p bounded adversarial perturbations.
arXiv Detail & Related papers (2020-12-03T02:26:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.