Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
- URL: http://arxiv.org/abs/2002.11242v2
- Date: Sat, 5 Sep 2020 09:53:08 GMT
- Title: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
- Authors: Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi
Sugiyama, Mohan Kankanhalli
- Abstract summary: Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models.
We argue that adversarial training is to employ confident adversarial data for updating the current model.
- Score: 85.96849265039619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training based on the minimax formulation is necessary for
obtaining adversarial robustness of trained models. However, it is conservative
or even pessimistic so that it sometimes hurts the natural generalization. In
this paper, we raise a fundamental question---do we have to trade off natural
generalization for adversarial robustness? We argue that adversarial training
is to employ confident adversarial data for updating the current model. We
propose a novel approach of friendly adversarial training (FAT): rather than
employing most adversarial data maximizing the loss, we search for least
adversarial (i.e., friendly adversarial) data minimizing the loss, among the
adversarial data that are confidently misclassified. Our novel formulation is
easy to implement by just stopping the most adversarial data searching
algorithms such as PGD (projected gradient descent) early, which we call
early-stopped PGD. Theoretically, FAT is justified by an upper bound of the
adversarial risk. Empirically, early-stopped PGD allows us to answer the
earlier question negatively---adversarial robustness can indeed be achieved
without compromising the natural generalization.
Related papers
- Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Outlier Robust Adversarial Training [57.06824365801612]
We introduce Outlier Robust Adversarial Training (ORAT) in this work.
ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function.
We show that the learning objective of ORAT satisfies the $mathcalH$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss.
arXiv Detail & Related papers (2023-09-10T21:36:38Z) - Improved Adversarial Training Through Adaptive Instance-wise Loss
Smoothing [5.1024659285813785]
Adversarial training has been the most successful defense against such adversarial attacks.
We propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training.
Our method achieves state-of-the-art robustness against $ell_infty$-norm constrained attacks.
arXiv Detail & Related papers (2023-03-24T15:41:40Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning.
This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z) - Robust Single-step Adversarial Training with Regularizer [11.35007968593652]
We propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting.
Experiments demonstrate that our proposed method can train a robust deep network for L$_infty$-perturbations with FGSM adversarial training.
arXiv Detail & Related papers (2021-02-05T19:07:10Z) - Robustness, Privacy, and Generalization of Adversarial Training [84.38148845727446]
This paper establishes and quantifies the privacy-robustness trade-off and generalization-robustness trade-off in adversarial training.
We show that adversarial training is $(varepsilon, delta)$-differentially private, where the magnitude of the differential privacy has a positive correlation with the robustified intensity.
Our generalization bounds do not explicitly rely on the parameter size which would be large in deep learning.
arXiv Detail & Related papers (2020-12-25T13:35:02Z) - Improving the affordability of robustness training for DNNs [11.971637253035107]
We show that the initial phase of adversarial training is redundant and can be replaced with natural training which significantly improves the computational efficiency.
We show that our proposed method can reduce the training time by a factor of up to 2.5 with comparable or better model test accuracy and generalization on various strengths of adversarial attacks.
arXiv Detail & Related papers (2020-02-11T07:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.