Related papers: The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training

The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training

URL: http://arxiv.org/abs/2211.00525v1
Date: Tue, 1 Nov 2022 15:24:26 GMT
Title: The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training
Authors: Junhao Dong, Seyed-Mohsen Moosavi-Dezfooli, Jianhuang Lai, Xiaohua Xie
Abstract summary: Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart. Our training method achieves state-of-the-art robustness as well as natural accuracy.
Score: 72.39526433794707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. These methods usually regularize the difference between output probabilities for an adversarial and its corresponding natural example. However, it may have a negative impact if the model misclassifies a natural example. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its ``inverse adversarial'' counterpart. These samples are generated to maximize the likelihood in the neighborhood of natural examples. Extensive experiments on various vision datasets and architectures demonstrate that our training method achieves state-of-the-art robustness as well as natural accuracy. Furthermore, using a universal version of inverse adversarial examples, we improve the performance of single-step adversarial training techniques at a low computational cost.

Related papers

Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method. Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z)
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models. We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z)
Mist: Towards Improved Adversarial Examples for Diffusion Models [0.8883733362171035]
Diffusion Models (DMs) have empowered great success in artificial-intelligence-generated content, especially in artwork creation. infringers can make profits by imitating non-authorized human-created paintings with DMs. Recent researches suggest that various adversarial examples for diffusion models can be effective tools against these copyright infringements.
arXiv Detail & Related papers (2023-05-22T03:43:34Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks. We investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z)
Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process. We propose two novel adversarial training approaches: CARL and RAR. Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
Single-step Adversarial training with Dropout Scheduling [59.50324605982158]
We show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries. Models trained using proposed single-step adversarial training method are robust against both single-step and multi-step adversarial attacks.
arXiv Detail & Related papers (2020-04-18T14:14:00Z)
Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods. Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.