Fooling Adversarial Training with Inducing Noise
- URL: http://arxiv.org/abs/2111.10130v1
- Date: Fri, 19 Nov 2021 09:59:28 GMT
- Title: Fooling Adversarial Training with Inducing Noise
- Authors: Zhirui Wang, Yifei Wang, Yisen Wang
- Abstract summary: Adversarial training is widely believed to be a reliable approach to improve model robustness against adversarial attack.
In this paper, we show that when trained on one type of poisoned data, adversarial training can also be fooled to have catastrophic behavior.
We propose a new type of inducing noise, named ADVIN, which is an irremovable poisoning of training data.
- Score: 18.07654610758511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training is widely believed to be a reliable approach to improve
model robustness against adversarial attack. However, in this paper, we show
that when trained on one type of poisoned data, adversarial training can also
be fooled to have catastrophic behavior, e.g., $<1\%$ robust test accuracy with
$>90\%$ robust training accuracy on CIFAR-10 dataset. Previously, there are
other types of noise poisoned in the training data that have successfully
fooled standard training ($15.8\%$ standard test accuracy with $99.9\%$
standard training accuracy on CIFAR-10 dataset), but their poisonings can be
easily removed when adopting adversarial training. Therefore, we aim to design
a new type of inducing noise, named ADVIN, which is an irremovable poisoning of
training data. ADVIN can not only degrade the robustness of adversarial
training by a large margin, for example, from $51.7\%$ to $0.57\%$ on CIFAR-10
dataset, but also be effective for fooling standard training ($13.1\%$ standard
test accuracy with $100\%$ standard training accuracy). Additionally, ADVIN can
be applied to preventing personal data (like selfies) from being exploited
without authorization under whether standard or adversarial training.
Related papers
- Raising the Bar for Certified Adversarial Robustness with Diffusion
Models [9.684141378657522]
In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.
One of our main insights is that the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the improvement.
Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $ell$ ($epsilon = 36/255$) and $ell_infty$ ($epsilon = 8/255$) threat models.
arXiv Detail & Related papers (2023-05-17T17:29:10Z) - RUSH: Robust Contrastive Learning via Randomized Smoothing [31.717748554905015]
In this paper, we show a surprising fact that contrastive pre-training has an interesting yet implicit connection with robustness.
We design a powerful robust algorithm against adversarial attacks, RUSH, that combines the standard contrastive pre-training and randomized smoothing.
Our work has an improvement of over 15% in robust accuracy and a slight improvement in standard accuracy, compared to the state-of-the-arts.
arXiv Detail & Related papers (2022-07-11T18:45:14Z) - Adversarial Unlearning: Reducing Confidence Along Adversarial Directions [88.46039795134993]
We propose a complementary regularization strategy that reduces confidence on self-generated examples.
The method, which we call RCAD, aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss.
Despite its simplicity, we find on many classification benchmarks that RCAD can be added to existing techniques to increase test accuracy by 1-3% in absolute value.
arXiv Detail & Related papers (2022-06-03T02:26:24Z) - Robustness Evaluation and Adversarial Training of an Instance
Segmentation Model [0.0]
We show that probabilisitic local equivalence is able to successfully distinguish between standardly-trained and adversarially-trained models.
We show that probabilisitic local equivalence is able to successfully distinguish between standardly-trained and adversarially-trained models.
arXiv Detail & Related papers (2022-06-02T02:18:09Z) - DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks.
Privacy has become an important concern, restricting access to only trained models but not the training data.
We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z) - Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data.
We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z) - Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning [134.15174177472807]
We introduce adversarial training into self-supervision, to provide general-purpose robust pre-trained models for the first time.
We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins.
arXiv Detail & Related papers (2020-03-28T18:28:33Z) - Fast is better than free: Revisiting adversarial training [86.11788847990783]
We show that it is possible to train empirically robust models using a much weaker and cheaper adversary.
We identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail.
arXiv Detail & Related papers (2020-01-12T20:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.