Adversarial Training with Rectified Rejection
- URL: http://arxiv.org/abs/2105.14785v1
- Date: Mon, 31 May 2021 08:24:53 GMT
- Title: Adversarial Training with Rectified Rejection
- Authors: Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen,
Jun Zhu, Tie-Yan Liu
- Abstract summary: We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
- Score: 114.83821848791206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) is one of the most effective strategies for
promoting model robustness, whereas even the state-of-the-art adversarially
trained models struggle to exceed 60% robust test accuracy on CIFAR-10 without
additional data, which is far from practical. A natural way to break this
accuracy bottleneck is to introduce a rejection option, where confidence is a
commonly used certainty proxy. However, the vanilla confidence can overestimate
the model certainty if the input is wrongly classified. To this end, we propose
to use true confidence (T-Con) (i.e., predicted probability of the true class)
as a certainty oracle, and learn to predict T-Con by rectifying confidence. We
prove that under mild conditions, a rectified confidence (R-Con) rejector and a
confidence rejector can be coupled to distinguish any wrongly classified input
from correctly classified ones, even under adaptive attacks. We also quantify
that training R-Con to be aligned with T-Con could be an easier task than
learning robust classifiers. In our experiments, we evaluate our rectified
rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several
attacks, and demonstrate that the RR module is well compatible with different
AT frameworks on improving robustness, with little extra computation.
Related papers
- New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes [11.694880978089852]
Adversarial Training (AT) is one of the most effective methods to enhance the robustness of DNNs.
Existing AT methods suffer from an inherent trade-off between adversarial robustness and clean accuracy.
We propose a new AT paradigm by introducing an additional dummy class for each original class.
arXiv Detail & Related papers (2024-10-16T15:36:10Z) - MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers [41.56951365163419]
"MixedNUTS" is a training-free method where the output logits of a robust classifier are processed by nonlinear transformations with only three parameters.
MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output.
On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness.
arXiv Detail & Related papers (2024-02-03T21:12:36Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences.
Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z) - RUSH: Robust Contrastive Learning via Randomized Smoothing [31.717748554905015]
In this paper, we show a surprising fact that contrastive pre-training has an interesting yet implicit connection with robustness.
We design a powerful robust algorithm against adversarial attacks, RUSH, that combines the standard contrastive pre-training and randomized smoothing.
Our work has an improvement of over 15% in robust accuracy and a slight improvement in standard accuracy, compared to the state-of-the-arts.
arXiv Detail & Related papers (2022-07-11T18:45:14Z) - Adversarial Feature Stacking for Accurate and Robust Predictions [4.208059346198116]
Adversarial Feature Stacking (AFS) model can jointly take advantage of features with varied levels of robustness and accuracy.
We evaluate the AFS model on CIFAR-10 and CIFAR-100 datasets with strong adaptive attack methods.
arXiv Detail & Related papers (2021-03-24T12:01:24Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning [134.15174177472807]
We introduce adversarial training into self-supervision, to provide general-purpose robust pre-trained models for the first time.
We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins.
arXiv Detail & Related papers (2020-03-28T18:28:33Z) - Adversarial Robustness on In- and Out-Distribution Improves
Explainability [109.68938066821246]
RATIO is a training procedure for robustness via Adversarial Training on In- and Out-distribution.
RATIO achieves state-of-the-art $l$-adrial on CIFAR10 and maintains better clean accuracy.
arXiv Detail & Related papers (2020-03-20T18:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.