Can Adversarial Training Be Manipulated By Non-Robust Features?
- URL: http://arxiv.org/abs/2201.13329v1
- Date: Mon, 31 Jan 2022 16:25:25 GMT
- Title: Can Adversarial Training Be Manipulated By Non-Robust Features?
- Authors: Lue Tao, Lei Feng, Hongxin Wei, Jinfeng Yi, Sheng-Jun Huang, Songcan
Chen
- Abstract summary: Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks.
We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly perturbing the training data.
Under this threat, we find that adversarial training using a conventional defense budget $epsilon$ provably fails to provide test robustness in a simple statistical setting.
- Score: 64.73107315313251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training, originally designed to resist test-time adversarial
examples, has shown to be promising in mitigating training-time availability
attacks. This defense ability, however, is challenged in this paper. We
identify a novel threat model named stability attacks, which aims to hinder
robust availability by slightly perturbing the training data. Under this
threat, we find that adversarial training using a conventional defense budget
$\epsilon$ provably fails to provide test robustness in a simple statistical
setting when the non-robust features of the training data are reinforced by
$\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging
the defense budget to counter stability attacks. Finally, comprehensive
experiments demonstrate that stability attacks are harmful on benchmark
datasets, and thus the adaptive defense is necessary to maintain robustness.
Related papers
- Position: Towards Resilience Against Adversarial Examples [42.09231029292568]
We provide a definition of adversarial resilience and outline considerations of designing an adversarially resilient defense.
We then introduce a subproblem of adversarial resilience which we call continual adaptive robustness.
We demonstrate the connection between continual adaptive robustness and previously studied problems of multiattack robustness and unforeseen attack robustness.
arXiv Detail & Related papers (2024-05-02T14:58:44Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections [12.807619042576018]
We propose a novel test-time defense strategy called Robust Feature Inference (RFI)
RFI is easy to integrate with any existing (robust) training procedure without additional test-time computation.
We show that RFI improves robustness across adaptive and transfer attacks consistently.
arXiv Detail & Related papers (2023-07-21T16:18:58Z) - Randomness in ML Defenses Helps Persistent Attackers and Hinders
Evaluators [49.52538232104449]
It is becoming increasingly imperative to design robust ML defenses.
Recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary.
We take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible.
arXiv Detail & Related papers (2023-02-27T01:33:31Z) - Strength-Adaptive Adversarial Training [103.28849734224235]
Adversarial training (AT) is proven to reliably improve network's robustness against adversarial data.
Current AT with a pre-specified perturbation budget has limitations in learning a robust network.
We propose emphStrength-Adaptive Adversarial Training (SAAT) to overcome these limitations.
arXiv Detail & Related papers (2022-10-04T00:22:37Z) - Increasing Confidence in Adversarial Robustness Evaluations [53.2174171468716]
We propose a test to identify weak attacks and thus weak defense evaluations.
Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample.
For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
arXiv Detail & Related papers (2022-06-28T13:28:13Z) - Towards Evaluating the Robustness of Neural Networks Learned by
Transduction [44.189248766285345]
Greedy Model Space Attack (GMSA) is an attack framework that can serve as a new baseline for evaluating transductive-learning based defenses.
We show that GMSA, even with weak instantiations, can break previous transductive-learning based defenses.
arXiv Detail & Related papers (2021-10-27T19:39:50Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.