Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks
- URL: http://arxiv.org/abs/2010.12989v1
- Date: Sat, 24 Oct 2020 21:20:35 GMT
- Title: Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks
- Authors: Huimin Zeng, Chen Zhu, Tom Goldstein, Furong Huang
- Abstract summary: Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
- Score: 70.11599738647963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Training is proved to be an efficient method to defend against
adversarial examples, being one of the few defenses that withstand strong
attacks. However, traditional defense mechanisms assume a uniform attack over
the examples according to the underlying data distribution, which is apparently
unrealistic as the attacker could choose to focus on more vulnerable examples.
We present a weighted minimax risk optimization that defends against
non-uniform attacks, achieving robustness against adversarial examples under
perturbed test data distributions. Our modified risk considers importance
weights of different adversarial examples and focuses adaptively on harder
examples that are wrongly classified or at higher risk of being classified
incorrectly. The designed risk allows the training process to learn a strong
defense through optimizing the importance weights. The experiments show that
our model significantly improves state-of-the-art adversarial accuracy under
non-uniform attacks without a significant drop under uniform attacks.
Related papers
- Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks [30.42301446202426]
Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus.
We make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples.
arXiv Detail & Related papers (2023-08-15T03:46:41Z) - Vulnerability-Aware Instance Reweighting For Adversarial Training [4.874780144224057]
Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks.
AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify.
Various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set.
In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks.
arXiv Detail & Related papers (2023-07-14T05:31:32Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Direction-Aggregated Attack for Transferable Adversarial Examples [10.208465711975242]
A deep neural network is vulnerable to adversarial examples crafted by imposing imperceptible changes to the inputs.
adversarial examples are most successful in white-box settings where the model and its parameters are available.
We propose the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples.
arXiv Detail & Related papers (2021-04-19T09:54:56Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.