Adversarially Robust Learning with Unknown Perturbation Sets
- URL: http://arxiv.org/abs/2102.02145v1
- Date: Wed, 3 Feb 2021 17:01:39 GMT
- Title: Adversarially Robust Learning with Unknown Perturbation Sets
- Authors: Omar Montasser, Steve Hanneke, Nathan Srebro
- Abstract summary: We study the problem of learning predictors that are robust to adversarial examples with respect to an unknown perturbation set.
We obtain upper bounds on the sample complexity and upper and lower bounds on the number of required interactions, or number of successful attacks.
- Score: 37.13850246542325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of learning predictors that are robust to adversarial
examples with respect to an unknown perturbation set, relying instead on
interaction with an adversarial attacker or access to attack oracles, examining
different models for such interactions. We obtain upper bounds on the sample
complexity and upper and lower bounds on the number of required interactions,
or number of successful attacks, in different interaction models, in terms of
the VC and Littlestone dimensions of the hypothesis class of predictors, and
without any assumptions on the perturbation set.
Related papers
- Unpacking the Resilience of SNLI Contradiction Examples to Attacks [0.38366697175402226]
We apply the Universal Adversarial Attack to examine the model's vulnerabilities.
Our analysis revealed substantial drops in accuracy for the entailment and neutral classes.
Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near-baseline levels.
arXiv Detail & Related papers (2024-12-15T12:47:28Z) - Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency [26.645723217188323]
Class-Specific Anomaly Detection (CSAD) is an effective novel anomaly detection approach.<n> CSAD evaluates adversarial samples relative to their predicted class distribution, rather than a broad benign distribution.<n>Our evaluation incorporates both anomaly detection rates with SHAP-based assessments to provide a more comprehensive measure of adversarial sample quality.
arXiv Detail & Related papers (2024-12-10T09:17:09Z) - Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Regularized Neural Ensemblers [55.15643209328513]
In this study, we explore employing regularized neural networks as ensemble methods.<n>Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions.<n>We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Adversarial Resilience in Sequential Prediction via Abstention [46.80218090768711]
We study the problem of sequential prediction in the setting with an adversary that is allowed to inject clean-label adversarial examples.
We propose a new model of sequential prediction that sits between the purely and fully adversarial settings.
arXiv Detail & Related papers (2023-06-22T17:44:22Z) - Robust Deep Learning Models Against Semantic-Preserving Adversarial
Attack [3.7264705684737893]
Deep learning models can be fooled by small $l_p$-norm adversarial perturbations and natural perturbations in terms of attributes.
We propose a novel attack mechanism named Semantic-Preserving Adversarial (SPA) attack, which can then be used to enhance adversarial training.
arXiv Detail & Related papers (2023-04-08T08:28:36Z) - Robustness of Deep Recommendation Systems to Untargeted Interaction
Perturbations [11.921365836430658]
We develop a novel framework in which user-item training interactions are perturbed in unintentional and adversarial settings.
We show that four popular recommender models are unstable against even one random perturbation.
We propose an adversarial perturbation method CASPER which identifies and perturbs an interaction that induces the maximal cascading effect.
arXiv Detail & Related papers (2022-01-29T23:43:21Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.