Toward Adversarial Robustness via Semi-supervised Robust Training
- URL: http://arxiv.org/abs/2003.06974v3
- Date: Tue, 16 Jun 2020 01:12:53 GMT
- Title: Toward Adversarial Robustness via Semi-supervised Robust Training
- Authors: Yiming Li, Baoyuan Wu, Yan Feng, Yanbo Fan, Yong Jiang, Zhifeng Li,
Shutao Xia
- Abstract summary: Adrial examples have been shown to be the severe threat to deep neural networks (DNNs)
We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
- Score: 93.36310070269643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples have been shown to be the severe threat to deep neural
networks (DNNs). One of the most effective adversarial defense methods is
adversarial training (AT) through minimizing the adversarial risk $R_{adv}$,
which encourages both the benign example $x$ and its adversarially perturbed
neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth
label. In this work, we propose a novel defense method, the robust training
(RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$),
which is with respect to the benign example and its neighborhoods respectively.
The motivation is to explicitly and jointly enhance the accuracy and the
adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand}
+ R_{rob}$, which implies that RT has similar effect as AT. Intuitively,
minimizing the standard risk enforces the benign example to be correctly
predicted, and the robust risk minimization encourages the predictions of the
neighbor examples to be consistent with the prediction of the benign example.
Besides, since $R_{rob}$ is independent of the ground-truth label, RT is
naturally extended to the semi-supervised mode ($i.e.$, SRT), to further
enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded
neighborhood to a general case, which covers different types of perturbations,
such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation
($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only
verify the superiority of the proposed SRT method to state-of-the-art methods
for defensing pixel-wise or spatial perturbations separately, but also
demonstrate its robustness to both perturbations simultaneously. The code for
reproducing main results is available at
\url{https://github.com/THUYimingLi/Semi-supervised_Robust_Training}.
Related papers
- $σ$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples [14.17412770504598]
We show that $ell_infty$-norm constraints can be used to craft input perturbations.
We propose a novel $ell_infty$-norm attack called $sigma$-norm.
It outperforms all competing adversarial attacks in terms of success, size, and efficiency.
arXiv Detail & Related papers (2024-02-02T20:08:11Z) - Adaptive Smoothness-weighted Adversarial Training for Multiple
Perturbations with Its Stability Analysis [39.90487314421744]
Adrial Training (AT) has been demonstrated as one of the most effective methods against adversarial examples.
Adrial training for multiple perturbations (ATMP) is proposed to generalize the adversarial robustness over different perturbation types.
We develop the stability-based excess risk bounds and propose adaptive-weighted adversarial training for multiple perturbations.
arXiv Detail & Related papers (2022-10-02T15:42:34Z) - Adversarially Robust Learning with Tolerance [8.658596218544774]
We study the problem of tolerant adversarial PAC learning with respect to metric perturbation sets.
We show that a variant of the natural perturb-and-smooth algorithm PAC learns any hypothesis class $mathcalH$ with VC dimension $v$ in the $gamma$-tolerant adversarial setting.
We additionally propose an alternative learning method which yields sample bounds with only linear dependence on the doubling dimension.
arXiv Detail & Related papers (2022-03-02T03:50:16Z) - Towards Compositional Adversarial Robustness: Generalizing Adversarial
Training to Composite Semantic Perturbations [70.05004034081377]
We first propose a novel method for generating composite adversarial examples.
Our method can find the optimal attack composition by utilizing component-wise projected gradient descent.
We then propose generalized adversarial training (GAT) to extend model robustness from $ell_p$-ball to composite semantic perturbations.
arXiv Detail & Related papers (2022-02-09T02:41:56Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations.
Many defence methods have been proposed that attempt to improve robustness to adversarial noise.
evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z) - Towards Defending Multiple $\ell_p$-norm Bounded Adversarial
Perturbations via Gated Batch Normalization [120.99395850108422]
Existing adversarial defenses typically improve model robustness against individual specific perturbations.
Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory.
We propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $ell_p bounded adversarial perturbations.
arXiv Detail & Related papers (2020-12-03T02:26:01Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis.
We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.